mirror of
https://github.com/deepseek-ai/DeepSeek-V3.git
synced 2025-07-05 07:51:38 -04:00
329 lines
13 KiB
Markdown
329 lines
13 KiB
Markdown
# 🚀 DeepSeek-V3: The Future of AI is Here
|
|
|
|
<div align="center">
|
|
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" />
|
|
</div>
|
|
|
|
<div align="center">
|
|
|
|
[](https://www.deepseek.com/)
|
|
[](https://chat.deepseek.com/)
|
|
[](https://huggingface.co/deepseek-ai)
|
|
[](https://discord.gg/Tc7c45Zzu5)
|
|
[](https://arxiv.org/pdf/2412.19437)
|
|
|
|
</div>
|
|
|
|
---
|
|
|
|
## 📊 **Model at a Glance**
|
|
|
|
<div align="center">
|
|
|
|
| 🔥 **Metric** | 💎 **Value** | 🎯 **Description** |
|
|
|:---:|:---:|:---|
|
|
| **🧠 Total Parameters** | **671B** | Massive scale for unprecedented capabilities |
|
|
| **⚡ Activated Parameters** | **37B** | Efficient MoE activation per token |
|
|
| **📝 Context Length** | **128K** | Extended context for complex tasks |
|
|
| **🎓 Training Tokens** | **14.8T** | Diverse, high-quality training data |
|
|
| **⏱️ Training Time** | **2.788M H800 GPU Hours** | Remarkably efficient training |
|
|
| **🏆 MATH-500 Score** | **90.2%** | State-of-the-art mathematical reasoning |
|
|
|
|
</div>
|
|
|
|
---
|
|
|
|
## 🌟 **Revolutionary Features**
|
|
|
|
```
|
|
🚀 DeepSeek-V3 Architecture Overview
|
|
│
|
|
├── 🧠 Innovative Architecture
|
|
│ ├── 🔄 Auxiliary-Loss-Free Load Balancing
|
|
│ ├── 🎲 Multi-Token Prediction (MTP)
|
|
│ └── 🏗️ Multi-Head Latent Attention
|
|
│
|
|
├── ⚡ Training Efficiency
|
|
│ ├── 🔢 FP8 Mixed Precision Training
|
|
│ ├── 📡 Computation-Communication Overlap
|
|
│ └── 💎 Zero Loss Spikes/Rollbacks
|
|
│
|
|
└── 🎯 Superior Performance
|
|
├── 🧮 Mathematics Excellence
|
|
├── 💻 Code Generation Mastery
|
|
└── 🤔 Advanced Reasoning
|
|
```
|
|
|
|
---
|
|
|
|
## 🏆 **Performance Benchmarks**
|
|
|
|
### 📚 **Academic Excellence**
|
|
|
|
<div align="center">
|
|
|
|
| 🎯 **Benchmark** | 🥈 **DeepSeek-V2** | 🥉 **Qwen2.5 72B** | 🥉 **LLaMA3.1 405B** | 🥇 **DeepSeek-V3** |
|
|
|:---|:---:|:---:|:---:|:---:|
|
|
| **📖 MMLU (Accuracy)** | 78.4% | 85.0% | 84.4% | **🏆 87.1%** |
|
|
| **🧮 MATH (Exact Match)** | 43.4% | 54.4% | 49.0% | **🏆 61.6%** |
|
|
| **🧠 BBH (Exact Match)** | 78.8% | 79.8% | 82.9% | **🏆 87.5%** |
|
|
| **📊 DROP (F1 Score)** | 80.4% | 80.6% | 86.0% | **🏆 89.0%** |
|
|
|
|
</div>
|
|
|
|
### 💻 **Code Generation Mastery**
|
|
|
|
<div align="center">
|
|
|
|
| 🎯 **Benchmark** | 🥈 **DeepSeek-V2** | 🥉 **Qwen2.5 72B** | 🥉 **LLaMA3.1 405B** | 🥇 **DeepSeek-V3** |
|
|
|:---|:---:|:---:|:---:|:---:|
|
|
| **👨💻 HumanEval (Pass@1)** | 43.3% | 53.0% | 54.9% | **🏆 65.2%** |
|
|
| **🔧 MBPP (Pass@1)** | 65.0% | 72.6% | 68.4% | **🏆 75.4%** |
|
|
| **🏃♂️ LiveCodeBench (Pass@1)** | 11.6% | 12.9% | 15.5% | **🏆 19.4%** |
|
|
|
|
</div>
|
|
|
|
### 🎭 **Chat Model Excellence**
|
|
|
|
<div align="center">
|
|
|
|
| 🎯 **Benchmark** | 🤖 **GPT-4o** | 🎭 **Claude-3.5-Sonnet** | 🦙 **LLaMA3.1 405B** | 🥇 **DeepSeek-V3** |
|
|
|:---|:---:|:---:|:---:|:---:|
|
|
| **🏟️ Arena-Hard** | 80.4 | 85.2 | 69.3 | **🏆 85.5** |
|
|
| **🦙 AlpacaEval 2.0** | 51.1% | 52.0% | 40.5% | **🏆 70.0%** |
|
|
| **📐 AIME 2024** | 9.3% | 16.0% | 23.3% | **🏆 39.2%** |
|
|
| **🧮 MATH-500** | 74.6% | 78.3% | 73.8% | **🏆 90.2%** |
|
|
|
|
</div>
|
|
|
|
---
|
|
|
|
## 📦 **Model Downloads**
|
|
|
|
<div align="center">
|
|
|
|
### 🎯 **Choose Your Model**
|
|
|
|
| 🤖 **Model** | 📊 **Parameters** | 🔗 **Download** | ⭐ **Use Case** |
|
|
|:---|:---:|:---:|:---|
|
|
| **🔬 DeepSeek-V3-Base** | 671B (37B active) | [](https://huggingface.co/deepseek-ai/DeepSeek-V3-Base) | Research & Fine-tuning |
|
|
| **💬 DeepSeek-V3-Chat** | 671B (37B active) | [](https://huggingface.co/deepseek-ai/DeepSeek-V3) | Conversations & Applications |
|
|
|
|
</div>
|
|
|
|
---
|
|
|
|
## 🌐 **Try Online**
|
|
|
|
<div align="center">
|
|
|
|
[](https://chat.deepseek.com/)
|
|
[](https://platform.deepseek.com/)
|
|
|
|
**💡 Experience the power of DeepSeek-V3 without any setup!**
|
|
|
|
</div>
|
|
|
|
---
|
|
|
|
## 🚀 **Local Deployment Options**
|
|
|
|
### 🔥 **Recommended Frameworks**
|
|
|
|
<div align="center">
|
|
|
|
| 🛠️ **Framework** | 💫 **Features** | 🎯 **Best For** | 📱 **Status** |
|
|
|:---|:---|:---|:---:|
|
|
| **🌊 SGLang** | MLA optimizations, FP8, Multi-node TP | **Production** | [](#) |
|
|
| **🚀 LMDeploy** | FP8/BF16, Cloud deployment | **Enterprise** | [](#) |
|
|
| **⚡ TensorRT-LLM** | INT4/8 quantization, NVIDIA optimization | **High Performance** | [](#) |
|
|
| **🌪️ vLLM** | Pipeline parallelism, Multi-GPU | **Scalability** | [](#) |
|
|
| **💡 LightLLM** | Multi-node, Mixed precision | **Flexibility** | [](#) |
|
|
|
|
</div>
|
|
|
|
### 🖥️ **Hardware Support**
|
|
|
|
<div align="center">
|
|
|
|
| 🔧 **Platform** | 💻 **Hardware** | 🎨 **Precision** | 📋 **Framework** |
|
|
|:---|:---|:---|:---|
|
|
| **🟢 NVIDIA GPUs** | H100, H800, A100 | FP8, BF16, INT4/8 | All frameworks |
|
|
| **🔴 AMD GPUs** | MI300X, MI250X | FP8, BF16 | SGLang, vLLM |
|
|
| **🟠 Huawei Ascend** | 910B NPUs | BF16, INT8 | MindIE |
|
|
|
|
</div>
|
|
|
|
---
|
|
|
|
## ⚡ **Quick Start**
|
|
|
|
### 🐍 **1. Installation**
|
|
|
|
```bash
|
|
# Clone the repository
|
|
git clone https://github.com/deepseek-ai/DeepSeek-V3.git
|
|
cd DeepSeek-V3/inference
|
|
|
|
# Install dependencies
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
### 🔧 **2. Model Conversion**
|
|
|
|
```bash
|
|
# Convert HuggingFace weights
|
|
python convert.py \
|
|
--hf-ckpt-path /path/to/DeepSeek-V3 \
|
|
--save-path /path/to/DeepSeek-V3-Demo \
|
|
--n-experts 256 \
|
|
--model-parallel 16
|
|
```
|
|
|
|
### 🎯 **3. Run Inference**
|
|
|
|
```bash
|
|
# Interactive chat
|
|
torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR \
|
|
generate.py --ckpt-path /path/to/DeepSeek-V3-Demo \
|
|
--config configs/config_671B.json --interactive --temperature 0.7
|
|
|
|
# Batch processing
|
|
torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR \
|
|
generate.py --ckpt-path /path/to/DeepSeek-V3-Demo \
|
|
--config configs/config_671B.json --input-file $FILE
|
|
```
|
|
|
|
---
|
|
|
|
## 🏗️ **Architecture Deep Dive**
|
|
|
|
### 🧠 **Core Innovations**
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ 🚀 DeepSeek-V3 Architecture │
|
|
├─────────────────────────────────────────────────────────────┤
|
|
│ 🔄 Auxiliary-Loss-Free Load Balancing │
|
|
│ ├── ⚖️ Minimizes performance degradation │
|
|
│ └── 🎯 Optimal expert utilization │
|
|
│ │
|
|
│ 🎲 Multi-Token Prediction (MTP) │
|
|
│ ├── 🚀 Enhanced model performance │
|
|
│ └── ⚡ Speculative decoding acceleration │
|
|
│ │
|
|
│ 🔢 FP8 Mixed Precision Training │
|
|
│ ├── 💎 First extreme-scale validation │
|
|
│ └── ⚡ Ultimate training efficiency │
|
|
│ │
|
|
│ 🧠 Knowledge Distillation from DeepSeek-R1 │
|
|
│ ├── 🔗 Long-Chain-of-Thought integration │
|
|
│ └── 🎯 Reasoning capability enhancement │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### 📈 **Training Efficiency**
|
|
|
|
<div align="center">
|
|
|
|
| 🎯 **Metric** | 💎 **Achievement** | 🏆 **Industry Impact** |
|
|
|:---|:---|:---|
|
|
| **⏱️ Training Time** | 2.664M H800 GPU hours | **Most efficient 671B model** |
|
|
| **📊 Data Volume** | 14.8T high-quality tokens | **Comprehensive knowledge base** |
|
|
| **🎯 Stability** | Zero loss spikes/rollbacks | **Unprecedented training stability** |
|
|
| **💰 Cost Efficiency** | Economical pre-training | **Accessible large-scale AI** |
|
|
|
|
</div>
|
|
|
|
---
|
|
|
|
## 🎨 **Context Window Performance**
|
|
|
|
<div align="center">
|
|
|
|
### 🔍 **Needle in a Haystack (NIAH) Results**
|
|
|
|
```
|
|
Context Length Performance
|
|
████████████████████████████████████████ 128K ✅ Perfect
|
|
██████████████████████████████████████ 96K ✅ Excellent
|
|
████████████████████████████████████ 64K ✅ Excellent
|
|
██████████████████████████████████ 32K ✅ Perfect
|
|
████████████████████████████ 16K ✅ Perfect
|
|
████████████████████ 8K ✅ Perfect
|
|
████████████ 4K ✅ Perfect
|
|
```
|
|
|
|
**🏆 DeepSeek-V3 maintains excellent performance across all context lengths up to 128K tokens**
|
|
|
|
</div>
|
|
|
|
---
|
|
|
|
## 📄 **Research & Citation**
|
|
|
|
### 📚 **Technical Paper**
|
|
|
|
[](https://arxiv.org/pdf/2412.19437)
|
|
|
|
### 📖 **Citation**
|
|
|
|
```bibtex
|
|
@misc{deepseekai2024deepseekv3technicalreport,
|
|
title={DeepSeek-V3 Technical Report},
|
|
author={DeepSeek-AI},
|
|
year={2024},
|
|
eprint={2412.19437},
|
|
archivePrefix={arXiv},
|
|
primaryClass={cs.CL},
|
|
url={https://arxiv.org/abs/2412.19437},
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 📜 **License & Usage**
|
|
|
|
<div align="center">
|
|
|
|
[](https://github.com/deepseek-ai/DeepSeek-V3/blob/main/LICENSE-CODE)
|
|
[](https://github.com/deepseek-ai/DeepSeek-V3/blob/main/LICENSE-MODEL)
|
|
|
|
**✅ Commercial use is fully supported for both Base and Chat models**
|
|
|
|
</div>
|
|
|
|
---
|
|
|
|
## 🌟 **Community & Support**
|
|
|
|
<div align="center">
|
|
|
|
### 🤝 **Join the Community**
|
|
|
|
[](https://www.deepseek.com/)
|
|
[](https://discord.gg/Tc7c45Zzu5)
|
|
[](https://twitter.com/deepseek_ai)
|
|
[](mailto:service@deepseek.com)
|
|
[](https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true)
|
|
|
|
</div>
|
|
|
|
---
|
|
|
|
<div align="center">
|
|
|
|
### 🚀 **Ready to Explore the Future?**
|
|
|
|
**DeepSeek-V3 represents a leap forward in artificial intelligence, combining unprecedented scale with remarkable efficiency. Join thousands of researchers, developers, and innovators who are already building the future with DeepSeek-V3.**
|
|
|
|
[](https://github.com/deepseek-ai/DeepSeek-V3)
|
|
[](https://github.com/deepseek-ai/DeepSeek-V3)
|
|
[](https://github.com/deepseek-ai/DeepSeek-V3)
|
|
|
|
---
|
|
|
|
**🎯 Built with ❤️ by DeepSeek-AI • Pushing the boundaries of artificial intelligence**
|
|
|
|
</div>
|