DeepSeek-V3

mirror of https://github.com/deepseek-ai/DeepSeek-V3.git synced 2025-04-19 18:18:57 -04:00

Author	SHA1	Message	Date
Gabriel Caetano	a3f30dc59c	Merge branch 'main' into main	2025-04-08 22:46:24 -03:00
Gabriel Caetano	a7bab5c920	Clean up and optimize Triton FP8 kernels - Improved readability and structure of Triton kernels for FP8 weight dequantization and matrix multiplication (GEMM) - Added comments for clarity - Replaced hardcoded block sizes with configurable parameters - Improved safety using tl.cdiv and masking - Renamed variables and ensured consistency in naming	2025-04-08 22:33:48 -03:00
Gabriel Caetano	61790e1653	Update 2 Here are the improvements made to the code for your commit message: Refactored init_distributed function: Extracted distributed setup logic into a separate function. Updated sample function: Replaced exponential approach with torch.multinomial for sampling. Improved argument validation: Replaced assert with a more user-friendly validation in main to ensure at least one parameter (input-file or interactive) is provided. Refactored interactive mode logic: Maintained user interaction logic but moved init_distributed call to the beginning of main.	2025-01-31 19:33:00 -03:00
Roman Fitzjalen	2756e130c2	clarify assertion error	2025-01-28 13:16:54 +01:00
enoch kan	a1296f099e	Enhance documentation and update .gitignore for model conversion scripts	2025-01-05 18:18:18 +00:00
stack-heap-overflow	4c2fdb8f55	Release DeepSeek-V3	2024-12-26 19:01:57 +08:00