Commit Graph

6 Commits

Author SHA1 Message Date
Gabriel Caetano
a3f30dc59c
Merge branch 'main' into main 2025-04-08 22:46:24 -03:00
Gabriel Caetano
a7bab5c920 Clean up and optimize Triton FP8 kernels
- Improved readability and structure of Triton kernels for FP8 weight dequantization and matrix multiplication (GEMM)
- Added comments for clarity
- Replaced hardcoded block sizes with configurable parameters
- Improved safety using tl.cdiv and masking
- Renamed variables and ensured consistency in naming
2025-04-08 22:33:48 -03:00
Gabriel Caetano
61790e1653 Update 2
Here are the improvements made to the code for your commit message:

Refactored init_distributed function: Extracted distributed setup logic into a separate function.
Updated sample function: Replaced exponential approach with torch.multinomial for sampling.
Improved argument validation: Replaced assert with a more user-friendly validation in main to ensure at least one parameter (input-file or interactive) is provided.
Refactored interactive mode logic: Maintained user interaction logic but moved init_distributed call to the beginning of main.
2025-01-31 19:33:00 -03:00
Roman Fitzjalen
2756e130c2 clarify assertion error 2025-01-28 13:16:54 +01:00
enoch kan
a1296f099e Enhance documentation and update .gitignore for model conversion scripts 2025-01-05 18:18:18 +00:00
stack-heap-overflow
4c2fdb8f55 Release DeepSeek-V3 2024-12-26 19:01:57 +08:00