Update README.md

This commit is contained in:
Fuli Luo 2024-05-06 22:50:03 +08:00 committed by GitHub
parent 9c7aa9ce01
commit e23eeb51a8
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -165,7 +165,7 @@ We evaluate our model on LiveCodeBench (0901-0401), a benchmark designed for liv
## 4. Model Architecture ## 4. Model Architecture
DeepSeek-V2 adopts innovative architectures to guarantee economical training and efficient inference DeepSeek-V2 adopts innovative architectures to guarantee economical training and efficient inference
- For attention, we design IEAttn, which utilizes low-rank key-value union compression to eliminate the bottleneck of inference-time key-value cache, thus supporting efficient inference. - For attention, we design MLA (Multi-head Latent Attention), which utilizes low-rank key-value union compression to eliminate the bottleneck of inference-time key-value cache, thus supporting efficient inference.
- For Feed-Forward Networks (FFNs), we adopt DeepSeekMoE architecture, a high-performance MoE architecture that enables training stronger models at lower costs. - For Feed-Forward Networks (FFNs), we adopt DeepSeekMoE architecture, a high-performance MoE architecture that enables training stronger models at lower costs.
<p align="center"> <p align="center">