Update README.md

Updated the introductory sentence in the "Introduction" section to improve clarity and readability.

Changed:
"We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token."

To: "DeepSeek-V3 is a powerful Mixture-of-Experts (MoE) language model with 671 billion total parameters, of which 37 billion are activated per token."

This revision ensures conciseness and better emphasis on key details.
This commit is contained in:
Afueth Thomas 2025-01-27 15:11:49 +05:30 committed by GitHub
parent b5d872ead0
commit 81969b0a06
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -47,7 +47,7 @@
## 1. Introduction ## 1. Introduction
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. DeepSeek-V3 is a powerful Mixture-of-Experts (MoE) language model with 671 billion total parameters, of which 37 billion are activated per token.
To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2.
Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance.
We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities.