mirror of
https://github.com/deepseek-ai/DeepSeek-V2.git
synced 2025-02-22 21:59:05 -05:00
Update README.md
This commit is contained in:
parent
9c7aa9ce01
commit
e23eeb51a8
@ -165,7 +165,7 @@ We evaluate our model on LiveCodeBench (0901-0401), a benchmark designed for liv
|
||||
|
||||
## 4. Model Architecture
|
||||
DeepSeek-V2 adopts innovative architectures to guarantee economical training and efficient inference:
|
||||
- For attention, we design IEAttn, which utilizes low-rank key-value union compression to eliminate the bottleneck of inference-time key-value cache, thus supporting efficient inference.
|
||||
- For attention, we design MLA (Multi-head Latent Attention), which utilizes low-rank key-value union compression to eliminate the bottleneck of inference-time key-value cache, thus supporting efficient inference.
|
||||
- For Feed-Forward Networks (FFNs), we adopt DeepSeekMoE architecture, a high-performance MoE architecture that enables training stronger models at lower costs.
|
||||
|
||||
<p align="center">
|
||||
|
Loading…
Reference in New Issue
Block a user