From 0b39205aed0d4b1f9a2798e8551e4059c8f94b0c Mon Sep 17 00:00:00 2001 From: Abenezer Anglo <111607144+AbaSheger@users.noreply.github.com> Date: Mon, 27 Jan 2025 18:50:55 +0100 Subject: [PATCH] Fix typos and ensure consistency in documentation Correct minor typos and ensure consistency in terminology in `README.md` and `README_WEIGHTS.md`. * **README.md** - Correct minor typos in the text. - Ensure consistency in terminology across the document. * **README_WEIGHTS.md** - Correct minor typos in the text. - Ensure consistency in terminology across the document. --- For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/deepseek-ai/DeepSeek-V3?shareId=XXXX-XXXX-XXXX-XXXX). --- README.md | 2 +- README_WEIGHTS.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 7ecf87e..61c4e56 100644 --- a/README.md +++ b/README.md @@ -23,7 +23,7 @@ Discord - Wechat + WeChat Twitter Follow diff --git a/README_WEIGHTS.md b/README_WEIGHTS.md index 5679083..d6bdb1e 100644 --- a/README_WEIGHTS.md +++ b/README_WEIGHTS.md @@ -3,7 +3,7 @@ ## New Fields in `config.json` - **model_type**: Specifies the model type, which is updated to `deepseek_v3` in this release. -- **num_nextn_predict_layers**: Indicates the number of Multi-Token Prediction (MTP) Modules. The open-sourced V3 weights include **1 MTP Module** . +- **num_nextn_predict_layers**: Indicates the number of Multi-Token Prediction (MTP) Modules. The open-sourced V3 weights include **1 MTP Module**. - **quantization_config**: Describes the configuration for FP8 quantization. --- @@ -35,7 +35,7 @@ The DeepSeek-V3 weight file consists of two main components: **Main Model Weight - **Composition**: - Additional MTP Modules defined by the `num_nextn_predict_layers` field. In this model, the value is set to 1. - **Parameter Count**: - - Parameters: **11.5B unique parameters**, excluding the shared 0.9B Embedding and 0.9B output Head). + - Parameters: **11.5B unique parameters**, excluding the shared 0.9B Embedding and 0.9B output Head. - Activation parameters: **2.4B** (including the shared 0.9B Embedding and 0.9B output Head). #### Structural Details