mirror of
https://github.com/deepseek-ai/DeepSeek-V3.git
synced 2025-04-19 10:08:59 -04:00
Fix typos and ensure consistency in documentation
Correct minor typos and ensure consistency in terminology in `README.md` and `README_WEIGHTS.md`. * **README.md** - Correct minor typos in the text. - Ensure consistency in terminology across the document. * **README_WEIGHTS.md** - Correct minor typos in the text. - Ensure consistency in terminology across the document. --- For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/deepseek-ai/DeepSeek-V3?shareId=XXXX-XXXX-XXXX-XXXX).
This commit is contained in:
parent
b5d872ead0
commit
0b39205aed
@ -23,7 +23,7 @@
|
||||
<img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da" style="display: inline-block; vertical-align: middle;"/>
|
||||
</a>
|
||||
<a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
|
||||
<img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
||||
<img alt="WeChat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
||||
</a>
|
||||
<a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
|
||||
<img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
||||
|
@ -3,7 +3,7 @@
|
||||
## New Fields in `config.json`
|
||||
|
||||
- **model_type**: Specifies the model type, which is updated to `deepseek_v3` in this release.
|
||||
- **num_nextn_predict_layers**: Indicates the number of Multi-Token Prediction (MTP) Modules. The open-sourced V3 weights include **1 MTP Module** .
|
||||
- **num_nextn_predict_layers**: Indicates the number of Multi-Token Prediction (MTP) Modules. The open-sourced V3 weights include **1 MTP Module**.
|
||||
- **quantization_config**: Describes the configuration for FP8 quantization.
|
||||
|
||||
---
|
||||
@ -35,7 +35,7 @@ The DeepSeek-V3 weight file consists of two main components: **Main Model Weight
|
||||
- **Composition**:
|
||||
- Additional MTP Modules defined by the `num_nextn_predict_layers` field. In this model, the value is set to 1.
|
||||
- **Parameter Count**:
|
||||
- Parameters: **11.5B unique parameters**, excluding the shared 0.9B Embedding and 0.9B output Head).
|
||||
- Parameters: **11.5B unique parameters**, excluding the shared 0.9B Embedding and 0.9B output Head.
|
||||
- Activation parameters: **2.4B** (including the shared 0.9B Embedding and 0.9B output Head).
|
||||
|
||||
#### Structural Details
|
||||
|
Loading…
Reference in New Issue
Block a user