mirror of
https://github.com/deepseek-ai/DeepSeek-V2.git
synced 2025-02-22 21:59:05 -05:00
Update README.md
This commit is contained in:
parent
9c7aa9ce01
commit
e23eeb51a8
@ -165,7 +165,7 @@ We evaluate our model on LiveCodeBench (0901-0401), a benchmark designed for liv
|
|||||||
|
|
||||||
## 4. Model Architecture
|
## 4. Model Architecture
|
||||||
DeepSeek-V2 adopts innovative architectures to guarantee economical training and efficient inference:
|
DeepSeek-V2 adopts innovative architectures to guarantee economical training and efficient inference:
|
||||||
- For attention, we design IEAttn, which utilizes low-rank key-value union compression to eliminate the bottleneck of inference-time key-value cache, thus supporting efficient inference.
|
- For attention, we design MLA (Multi-head Latent Attention), which utilizes low-rank key-value union compression to eliminate the bottleneck of inference-time key-value cache, thus supporting efficient inference.
|
||||||
- For Feed-Forward Networks (FFNs), we adopt DeepSeekMoE architecture, a high-performance MoE architecture that enables training stronger models at lower costs.
|
- For Feed-Forward Networks (FFNs), we adopt DeepSeekMoE architecture, a high-performance MoE architecture that enables training stronger models at lower costs.
|
||||||
|
|
||||||
<p align="center">
|
<p align="center">
|
||||||
|
Loading…
Reference in New Issue
Block a user