DeepSeek-V3/inference
Gabriel Caetano a7bab5c920 Clean up and optimize Triton FP8 kernels
- Improved readability and structure of Triton kernels for FP8 weight dequantization and matrix multiplication (GEMM)
- Added comments for clarity
- Replaced hardcoded block sizes with configurable parameters
- Improved safety using tl.cdiv and masking
- Renamed variables and ensured consistency in naming
2025-04-08 22:33:48 -03:00
..
configs Release DeepSeek-V3 2024-12-26 19:01:57 +08:00
convert.py Enhance documentation and update .gitignore for model conversion scripts 2025-01-05 18:18:18 +00:00
fp8_cast_bf16.py Enhance documentation and update .gitignore for model conversion scripts 2025-01-05 18:18:18 +00:00
generate.py Change 2025-01-30 22:47:39 -03:00
kernel.py Clean up and optimize Triton FP8 kernels 2025-04-08 22:33:48 -03:00
model.py Updated model.py docstrings 2025-01-05 18:24:31 +00:00
requirements.txt Release DeepSeek-V3 2024-12-26 19:01:57 +08:00