mirror of
https://github.com/deepseek-ai/DeepSeek-V3.git
synced 2025-04-19 18:18:57 -04:00
Update README.md
Updated the introductory sentence in the "Introduction" section to improve clarity and readability. Changed: "We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token." To: "DeepSeek-V3 is a powerful Mixture-of-Experts (MoE) language model with 671 billion total parameters, of which 37 billion are activated per token." This revision ensures conciseness and better emphasis on key details.
This commit is contained in:
parent
b5d872ead0
commit
81969b0a06
@ -47,7 +47,7 @@
|
||||
|
||||
## 1. Introduction
|
||||
|
||||
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.
|
||||
DeepSeek-V3 is a powerful Mixture-of-Experts (MoE) language model with 671 billion total parameters, of which 37 billion are activated per token.
|
||||
To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2.
|
||||
Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance.
|
||||
We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities.
|
||||
|
Loading…
Reference in New Issue
Block a user