Update README_WEIGHTS.md

2025-06-20 16:43:48 -04:00 · 2025-01-28 11:50:16 +09:00 · 2025-01-28 11:50:16 +09:00 · e95bcb8064
commit e95bcb8064
parent b5d872ead0
1 changed files with 6 additions and 6 deletions
--- a/README_WEIGHTS.md
+++ b/README_WEIGHTS.md
@ -1,8 +1,8 @@
 # DeepSeek-V3 Weight File Documentation
-## New Fields in `config.json`
+## New Fields in [`config.json`](https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/config.json)
- **model_type**: Specifies the model type, which is updated to `deepseek_v3` in this release.
+- **model_type**: Specifies the model type, set to `deepseek_v3` in this release.
 - **num_nextn_predict_layers**: Indicates the number of Multi-Token Prediction (MTP) Modules. The open-sourced V3 weights include **1 MTP Module** .
 - **quantization_config**: Describes the configuration for FP8 quantization.
@ -33,7 +33,7 @@ The DeepSeek-V3 weight file consists of two main components: **Main Model Weight
 ### 2. Multi-Token Prediction (MTP) Modules
 - **Composition**:
-  - Additional MTP Modules defined by the `num_nextn_predict_layers` field. In this model, the value is set to 1.
+  - Additional MTP Modules defined by the `num_nextn_predict_layers` field. For this model, it is set to 1.
 - **Parameter Count**:
  - Parameters: **11.5B unique parameters**, excluding the shared 0.9B Embedding and 0.9B output Head).
  - Activation parameters: **2.4B** (including the shared 0.9B Embedding and 0.9B output Head).
@ -75,7 +75,7 @@ The FP8 weight file introduces a `quantization_config` field to describe the qua
 ```
 - **Quantization Format**:
-  - Format type: `fp8` and `e4m3` (corresponding to `torch.float8_e4m3fn`).
+  - Format type: `fp8` and `e4m3` (corresponds to `torch.float8_e4m3fn`).
  - Weight block size: `128x128`.
 - **Activation Quantization Scheme**:
  - Utilizes dynamic activation quantization (`dynamic`).
@ -86,8 +86,8 @@ The FP8 weight file includes a `weight_scale_inv` field, which stores the dequan
 - **Storage Format**: `float32 Tensor`, stored alongside the weight data.
 - **Dequantization Formula**:
-  - If the weight block is not aligned to 128, it is zero-padded to 128 before calculating the scale. After quantization, the padded portion is removed.
+  - If a weight block is not aligned to 128, it is zero-padded to 128 before the scale is calculated. After quantization, the padded portion is removed.
-  - The dequantization process is performed as: `(128x128 weight block) * weight_scale_inv`.
+  - Dequantization is performed as `(128x128 weight block) * weight_scale_inv`.
 Through dequantization of the FP8 weights, runtime operations enable online quantization at a granularity of `per-token-per-128-channel`.