From 0b39205aed0d4b1f9a2798e8551e4059c8f94b0c Mon Sep 17 00:00:00 2001
From: Abenezer Anglo <111607144+AbaSheger@users.noreply.github.com>
Date: Mon, 27 Jan 2025 18:50:55 +0100
Subject: [PATCH] Fix typos and ensure consistency in documentation
Correct minor typos and ensure consistency in terminology in `README.md` and `README_WEIGHTS.md`.
* **README.md**
- Correct minor typos in the text.
- Ensure consistency in terminology across the document.
* **README_WEIGHTS.md**
- Correct minor typos in the text.
- Ensure consistency in terminology across the document.
---
For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/deepseek-ai/DeepSeek-V3?shareId=XXXX-XXXX-XXXX-XXXX).
---
README.md | 2 +-
README_WEIGHTS.md | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/README.md b/README.md
index 7ecf87e..61c4e56 100644
--- a/README.md
+++ b/README.md
@@ -23,7 +23,7 @@
-
+
diff --git a/README_WEIGHTS.md b/README_WEIGHTS.md
index 5679083..d6bdb1e 100644
--- a/README_WEIGHTS.md
+++ b/README_WEIGHTS.md
@@ -3,7 +3,7 @@
## New Fields in `config.json`
- **model_type**: Specifies the model type, which is updated to `deepseek_v3` in this release.
-- **num_nextn_predict_layers**: Indicates the number of Multi-Token Prediction (MTP) Modules. The open-sourced V3 weights include **1 MTP Module** .
+- **num_nextn_predict_layers**: Indicates the number of Multi-Token Prediction (MTP) Modules. The open-sourced V3 weights include **1 MTP Module**.
- **quantization_config**: Describes the configuration for FP8 quantization.
---
@@ -35,7 +35,7 @@ The DeepSeek-V3 weight file consists of two main components: **Main Model Weight
- **Composition**:
- Additional MTP Modules defined by the `num_nextn_predict_layers` field. In this model, the value is set to 1.
- **Parameter Count**:
- - Parameters: **11.5B unique parameters**, excluding the shared 0.9B Embedding and 0.9B output Head).
+ - Parameters: **11.5B unique parameters**, excluding the shared 0.9B Embedding and 0.9B output Head.
- Activation parameters: **2.4B** (including the shared 0.9B Embedding and 0.9B output Head).
#### Structural Details