diff --git a/README_WEIGHTS.md b/README_WEIGHTS.md
index 5679083..77a8f63 100644
--- a/README_WEIGHTS.md
+++ b/README_WEIGHTS.md
@@ -1,8 +1,8 @@
 # DeepSeek-V3 Weight File Documentation
 
-## New Fields in `config.json`
+## New Fields in [`config.json`](https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/config.json)
 
-- **model_type**: Specifies the model type, which is updated to `deepseek_v3` in this release.
+- **model_type**: Specifies the model type, set to `deepseek_v3` in this release.
 - **num_nextn_predict_layers**: Indicates the number of Multi-Token Prediction (MTP) Modules. The open-sourced V3 weights include **1 MTP Module** .
 - **quantization_config**: Describes the configuration for FP8 quantization.
 
@@ -33,7 +33,7 @@ The DeepSeek-V3 weight file consists of two main components: **Main Model Weight
 ### 2. Multi-Token Prediction (MTP) Modules
 
 - **Composition**:
-  - Additional MTP Modules defined by the `num_nextn_predict_layers` field. In this model, the value is set to 1.
+  - Additional MTP Modules defined by the `num_nextn_predict_layers` field. For this model, it is set to 1.
 - **Parameter Count**:
   - Parameters: **11.5B unique parameters**, excluding the shared 0.9B Embedding and 0.9B output Head).
   - Activation parameters: **2.4B** (including the shared 0.9B Embedding and 0.9B output Head).
@@ -75,7 +75,7 @@ The FP8 weight file introduces a `quantization_config` field to describe the qua
 ```
 
 - **Quantization Format**:
-  - Format type: `fp8` and `e4m3` (corresponding to `torch.float8_e4m3fn`).
+  - Format type: `fp8` and `e4m3` (corresponds to `torch.float8_e4m3fn`).
   - Weight block size: `128x128`.
 - **Activation Quantization Scheme**:
   - Utilizes dynamic activation quantization (`dynamic`).
@@ -86,8 +86,8 @@ The FP8 weight file includes a `weight_scale_inv` field, which stores the dequan
 
 - **Storage Format**: `float32 Tensor`, stored alongside the weight data.
 - **Dequantization Formula**:
-  - If the weight block is not aligned to 128, it is zero-padded to 128 before calculating the scale. After quantization, the padded portion is removed.
-  - The dequantization process is performed as: `(128x128 weight block) * weight_scale_inv`.
+  - If a weight block is not aligned to 128, it is zero-padded to 128 before the scale is calculated. After quantization, the padded portion is removed.
+  - Dequantization is performed as `(128x128 weight block) * weight_scale_inv`.
 
 Through dequantization of the FP8 weights, runtime operations enable online quantization at a granularity of `per-token-per-128-channel`.