DeepSeek-V3/README.md

# 🚀 DeepSeek-V3: The Future of AI is Here

<div align="center">
  <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" />
</div>

<div align="center">

[![Homepage](https://img.shields.io/badge/🌐_Homepage-DeepSeek-blue?style=for-the-badge&color=007acc)](https://www.deepseek.com/)
[![Chat](https://img.shields.io/badge/🤖_Chat-DeepSeek_V3-blue?style=for-the-badge&color=00d4ff)](https://chat.deepseek.com/)
[![Hugging Face](https://img.shields.io/badge/🤗_Hugging_Face-DeepSeek_AI-yellow?style=for-the-badge&color=ffc107)](https://huggingface.co/deepseek-ai)
[![Discord](https://img.shields.io/badge/💬_Discord-Join_Community-purple?style=for-the-badge&color=7289da)](https://discord.gg/Tc7c45Zzu5)
[![Paper](https://img.shields.io/badge/📄_Paper-ArXiv-red?style=for-the-badge&color=b31b1b)](https://arxiv.org/pdf/2412.19437)

</div>

---

## 📊 **Model at a Glance**

<div align="center">

| 🔥 **Metric** | 💎 **Value** | 🎯 **Description** |
|:---:|:---:|:---|
| **🧠 Total Parameters** | **671B** | Massive scale for unprecedented capabilities |
| **⚡ Activated Parameters** | **37B** | Efficient MoE activation per token |
| **📝 Context Length** | **128K** | Extended context for complex tasks |
| **🎓 Training Tokens** | **14.8T** | Diverse, high-quality training data |
| **⏱️ Training Time** | **2.788M H800 GPU Hours** | Remarkably efficient training |
| **🏆 MATH-500 Score** | **90.2%** | State-of-the-art mathematical reasoning |

</div>

---

## 🌟 **Revolutionary Features**

```
🚀 DeepSeek-V3 Architecture Overview
│
├── 🧠 Innovative Architecture
│   ├── 🔄 Auxiliary-Loss-Free Load Balancing
│   ├── 🎲 Multi-Token Prediction (MTP)
│   └── 🏗️ Multi-Head Latent Attention
│
├── ⚡ Training Efficiency
│   ├── 🔢 FP8 Mixed Precision Training
│   ├── 📡 Computation-Communication Overlap
│   └── 💎 Zero Loss Spikes/Rollbacks
│
└── 🎯 Superior Performance
    ├── 🧮 Mathematics Excellence
    ├── 💻 Code Generation Mastery
    └── 🤔 Advanced Reasoning
```

---

## 🏆 **Performance Benchmarks**

### 📚 **Academic Excellence**

<div align="center">

| 🎯 **Benchmark** | 🥈 **DeepSeek-V2** | 🥉 **Qwen2.5 72B** | 🥉 **LLaMA3.1 405B** | 🥇 **DeepSeek-V3** |
|:---|:---:|:---:|:---:|:---:|
| **📖 MMLU (Accuracy)** | 78.4% | 85.0% | 84.4% | **🏆 87.1%** |
| **🧮 MATH (Exact Match)** | 43.4% | 54.4% | 49.0% | **🏆 61.6%** |
| **🧠 BBH (Exact Match)** | 78.8% | 79.8% | 82.9% | **🏆 87.5%** |
| **📊 DROP (F1 Score)** | 80.4% | 80.6% | 86.0% | **🏆 89.0%** |

</div>

### 💻 **Code Generation Mastery**

<div align="center">

| 🎯 **Benchmark** | 🥈 **DeepSeek-V2** | 🥉 **Qwen2.5 72B** | 🥉 **LLaMA3.1 405B** | 🥇 **DeepSeek-V3** |
|:---|:---:|:---:|:---:|:---:|
| **👨‍💻 HumanEval (Pass@1)** | 43.3% | 53.0% | 54.9% | **🏆 65.2%** |
| **🔧 MBPP (Pass@1)** | 65.0% | 72.6% | 68.4% | **🏆 75.4%** |
| **🏃‍♂️ LiveCodeBench (Pass@1)** | 11.6% | 12.9% | 15.5% | **🏆 19.4%** |

</div>

### 🎭 **Chat Model Excellence**

<div align="center">

| 🎯 **Benchmark** | 🤖 **GPT-4o** | 🎭 **Claude-3.5-Sonnet** | 🦙 **LLaMA3.1 405B** | 🥇 **DeepSeek-V3** |
|:---|:---:|:---:|:---:|:---:|
| **🏟️ Arena-Hard** | 80.4 | 85.2 | 69.3 | **🏆 85.5** |
| **🦙 AlpacaEval 2.0** | 51.1% | 52.0% | 40.5% | **🏆 70.0%** |
| **📐 AIME 2024** | 9.3% | 16.0% | 23.3% | **🏆 39.2%** |
| **🧮 MATH-500** | 74.6% | 78.3% | 73.8% | **🏆 90.2%** |

</div>

---

## 📦 **Model Downloads**

<div align="center">

### 🎯 **Choose Your Model**

| 🤖 **Model** | 📊 **Parameters** | 🔗 **Download** | ⭐ **Use Case** |
|:---|:---:|:---:|:---|
| **🔬 DeepSeek-V3-Base** | 671B (37B active) | [![🤗 Download](https://img.shields.io/badge/🤗_Download-Base_Model-blue?style=for-the-badge)](https://huggingface.co/deepseek-ai/DeepSeek-V3-Base) | Research & Fine-tuning |
| **💬 DeepSeek-V3-Chat** | 671B (37B active) | [![🤗 Download](https://img.shields.io/badge/🤗_Download-Chat_Model-green?style=for-the-badge)](https://huggingface.co/deepseek-ai/DeepSeek-V3) | Conversations & Applications |

</div>

---

## 🌐 **Try Online**

<div align="center">

[![🌐 Chat Interface](https://img.shields.io/badge/🌐_Try_DeepSeek_V3-Chat_Interface-blue?style=for-the-badge&color=00d4ff)](https://chat.deepseek.com/)
[![🔌 API Platform](https://img.shields.io/badge/🔌_Developer_API-Platform-orange?style=for-the-badge&color=ff6b35)](https://platform.deepseek.com/)

**💡 Experience the power of DeepSeek-V3 without any setup!**

</div>

---

## 🚀 **Local Deployment Options**

### 🔥 **Recommended Frameworks**

<div align="center">

| 🛠️ **Framework** | 💫 **Features** | 🎯 **Best For** | 📱 **Status** |
|:---|:---|:---|:---:|
| **🌊 SGLang** | MLA optimizations, FP8, Multi-node TP | **Production** | [![✅](https://img.shields.io/badge/✅-Ready-green)](#) |
| **🚀 LMDeploy** | FP8/BF16, Cloud deployment | **Enterprise** | [![✅](https://img.shields.io/badge/✅-Ready-green)](#) |
| **⚡ TensorRT-LLM** | INT4/8 quantization, NVIDIA optimization | **High Performance** | [![✅](https://img.shields.io/badge/✅-Ready-green)](#) |
| **🌪️ vLLM** | Pipeline parallelism, Multi-GPU | **Scalability** | [![✅](https://img.shields.io/badge/✅-Ready-green)](#) |
| **💡 LightLLM** | Multi-node, Mixed precision | **Flexibility** | [![✅](https://img.shields.io/badge/✅-Ready-green)](#) |

</div>

### 🖥️ **Hardware Support**

<div align="center">

| 🔧 **Platform** | 💻 **Hardware** | 🎨 **Precision** | 📋 **Framework** |
|:---|:---|:---|:---|
| **🟢 NVIDIA GPUs** | H100, H800, A100 | FP8, BF16, INT4/8 | All frameworks |
| **🔴 AMD GPUs** | MI300X, MI250X | FP8, BF16 | SGLang, vLLM |
| **🟠 Huawei Ascend** | 910B NPUs | BF16, INT8 | MindIE |

</div>

---

## ⚡ **Quick Start**

### 🐍 **1. Installation**

```bash
# Clone the repository
git clone https://github.com/deepseek-ai/DeepSeek-V3.git
cd DeepSeek-V3/inference

# Install dependencies
pip install -r requirements.txt
```

### 🔧 **2. Model Conversion**

```bash
# Convert HuggingFace weights
python convert.py \
  --hf-ckpt-path /path/to/DeepSeek-V3 \
  --save-path /path/to/DeepSeek-V3-Demo \
  --n-experts 256 \
  --model-parallel 16
```

### 🎯 **3. Run Inference**

```bash
# Interactive chat
torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR \
  generate.py --ckpt-path /path/to/DeepSeek-V3-Demo \
  --config configs/config_671B.json --interactive --temperature 0.7

# Batch processing
torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR \
  generate.py --ckpt-path /path/to/DeepSeek-V3-Demo \
  --config configs/config_671B.json --input-file $FILE
```

---

## 🏗️ **Architecture Deep Dive**

### 🧠 **Core Innovations**

```
┌─────────────────────────────────────────────────────────────┐
│                   🚀 DeepSeek-V3 Architecture               │
├─────────────────────────────────────────────────────────────┤
│  🔄 Auxiliary-Loss-Free Load Balancing                     │
│   ├── ⚖️  Minimizes performance degradation                │
│   └── 🎯 Optimal expert utilization                        │
│                                                             │
│  🎲 Multi-Token Prediction (MTP)                           │
│   ├── 🚀 Enhanced model performance                        │
│   └── ⚡ Speculative decoding acceleration                 │
│                                                             │
│  🔢 FP8 Mixed Precision Training                           │
│   ├── 💎 First extreme-scale validation                    │
│   └── ⚡ Ultimate training efficiency                      │
│                                                             │
│  🧠 Knowledge Distillation from DeepSeek-R1               │
│   ├── 🔗 Long-Chain-of-Thought integration                │
│   └── 🎯 Reasoning capability enhancement                  │
└─────────────────────────────────────────────────────────────┘
```

### 📈 **Training Efficiency**

<div align="center">

| 🎯 **Metric** | 💎 **Achievement** | 🏆 **Industry Impact** |
|:---|:---|:---|
| **⏱️ Training Time** | 2.664M H800 GPU hours | **Most efficient 671B model** |
| **📊 Data Volume** | 14.8T high-quality tokens | **Comprehensive knowledge base** |
| **🎯 Stability** | Zero loss spikes/rollbacks | **Unprecedented training stability** |
| **💰 Cost Efficiency** | Economical pre-training | **Accessible large-scale AI** |

</div>

---

## 🎨 **Context Window Performance**

<div align="center">

### 🔍 **Needle in a Haystack (NIAH) Results**

```
Context Length Performance
████████████████████████████████████████ 128K ✅ Perfect
██████████████████████████████████████   96K  ✅ Excellent
████████████████████████████████████     64K  ✅ Excellent
██████████████████████████████████       32K  ✅ Perfect
████████████████████████████             16K  ✅ Perfect
████████████████████                      8K  ✅ Perfect
████████████                              4K  ✅ Perfect
```

**🏆 DeepSeek-V3 maintains excellent performance across all context lengths up to 128K tokens**

</div>

---

## 📄 **Research & Citation**

### 📚 **Technical Paper**

[![📄 Read Paper](https://img.shields.io/badge/📄_Read_Paper-ArXiv_2412.19437-red?style=for-the-badge)](https://arxiv.org/pdf/2412.19437)

### 📖 **Citation**

```bibtex
@misc{deepseekai2024deepseekv3technicalreport,
    title={DeepSeek-V3 Technical Report},
    author={DeepSeek-AI},
    year={2024},
    eprint={2412.19437},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2412.19437},
}
```

---

## 📜 **License & Usage**

<div align="center">

[![Code License](https://img.shields.io/badge/Code_License-MIT-green?style=for-the-badge)](https://github.com/deepseek-ai/DeepSeek-V3/blob/main/LICENSE-CODE)
[![Model License](https://img.shields.io/badge/Model_License-Commercial_Use_Supported-blue?style=for-the-badge)](https://github.com/deepseek-ai/DeepSeek-V3/blob/main/LICENSE-MODEL)

**✅ Commercial use is fully supported for both Base and Chat models**

</div>

---

## 🌟 **Community & Support**

<div align="center">

### 🤝 **Join the Community**

[![🌐 Homepage](https://img.shields.io/badge/🌐_Homepage-DeepSeek.com-blue?style=social)](https://www.deepseek.com/)
[![💬 Discord](https://img.shields.io/badge/💬_Discord-Join_Chat-purple?style=social)](https://discord.gg/Tc7c45Zzu5)
[![🐦 Twitter](https://img.shields.io/badge/🐦_Twitter-@deepseek__ai-blue?style=social)](https://twitter.com/deepseek_ai)
[![📧 Email](https://img.shields.io/badge/📧_Email-service@deepseek.com-red?style=social)](mailto:service@deepseek.com)
[![💬 WeChat](https://img.shields.io/badge/💬_WeChat-DeepSeek_AI-green?style=social)](https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true)

</div>

---

<div align="center">

### 🚀 **Ready to Explore the Future?**

**DeepSeek-V3 represents a leap forward in artificial intelligence, combining unprecedented scale with remarkable efficiency. Join thousands of researchers, developers, and innovators who are already building the future with DeepSeek-V3.**

[![🌟 Star this repo](https://img.shields.io/github/stars/deepseek-ai/DeepSeek-V3?style=social)](https://github.com/deepseek-ai/DeepSeek-V3)
[![👁️ Watch for updates](https://img.shields.io/github/watchers/deepseek-ai/DeepSeek-V3?style=social)](https://github.com/deepseek-ai/DeepSeek-V3)
[![🍴 Fork and contribute](https://img.shields.io/github/forks/deepseek-ai/DeepSeek-V3?style=social)](https://github.com/deepseek-ai/DeepSeek-V3)

---

**🎯 Built with ❤️ by DeepSeek-AI • Pushing the boundaries of artificial intelligence**

</div>