- Improved readability and structure of Triton kernels for FP8 weight dequantization and matrix multiplication (GEMM)
- Added comments for clarity
- Replaced hardcoded block sizes with configurable parameters
- Improved safety using tl.cdiv and masking
- Renamed variables and ensured consistency in naming
fix(table): correct bold formatting for TriviaQA EM comparison
- Remove redundant bolding on LLaMA3.1 405B (82.7)
- Retain single bold style for DeepSeek-V3's highest score (82.9)
- Aligns with evaluation convention of highlighting only the best performance
Here are the improvements made to the code for your commit message:
Refactored init_distributed function: Extracted distributed setup logic into a separate function.
Updated sample function: Replaced exponential approach with torch.multinomial for sampling.
Improved argument validation: Replaced assert with a more user-friendly validation in main to ensure at least one parameter (input-file or interactive) is provided.
Refactored interactive mode logic: Maintained user interaction logic but moved init_distributed call to the beginning of main.
Changes:
init_distributed function: Extracted the distributed setup logic into a separate function.
sample function: Modified it to use torch.multinomial instead of an exponentiation-based approach for sampling.
Argument Validation: Replaced the assert with a more user-friendly validation in main to ensure that at least one of the parameters (input-file or interactive) is provided.
Interactive Code Refactoring: The user interaction logic was kept, but the init_distributed function is now called separately at the beginning of main.
This file includes detailed citation information for the DeepSeek-V3 project, such as authors, DOI, license, and key project details. It enables users to properly cite the work and promotes better academic and professional attribution.