mirror of
https://github.com/deepseek-ai/DeepSeek-V3.git
synced 2025-04-20 02:28:57 -04:00
- Improved readability and structure of Triton kernels for FP8 weight dequantization and matrix multiplication (GEMM) - Added comments for clarity - Replaced hardcoded block sizes with configurable parameters - Improved safety using tl.cdiv and masking - Renamed variables and ensured consistency in naming |
||
---|---|---|
.. | ||
configs | ||
convert.py | ||
fp8_cast_bf16.py | ||
generate.py | ||
kernel.py | ||
model.py | ||
requirements.txt |