# 🚀 DeepSeek-V3: The Future of AI is Here
DeepSeek-V3
[![Homepage](https://img.shields.io/badge/🌐_Homepage-DeepSeek-blue?style=for-the-badge&color=007acc)](https://www.deepseek.com/) [![Chat](https://img.shields.io/badge/🤖_Chat-DeepSeek_V3-blue?style=for-the-badge&color=00d4ff)](https://chat.deepseek.com/) [![Hugging Face](https://img.shields.io/badge/🤗_Hugging_Face-DeepSeek_AI-yellow?style=for-the-badge&color=ffc107)](https://huggingface.co/deepseek-ai) [![Discord](https://img.shields.io/badge/💬_Discord-Join_Community-purple?style=for-the-badge&color=7289da)](https://discord.gg/Tc7c45Zzu5) [![Paper](https://img.shields.io/badge/📄_Paper-ArXiv-red?style=for-the-badge&color=b31b1b)](https://arxiv.org/pdf/2412.19437)
--- ## 📊 **Model at a Glance**
| 🔥 **Metric** | 💎 **Value** | 🎯 **Description** | |:---:|:---:|:---| | **🧠 Total Parameters** | **671B** | Massive scale for unprecedented capabilities | | **⚡ Activated Parameters** | **37B** | Efficient MoE activation per token | | **📝 Context Length** | **128K** | Extended context for complex tasks | | **🎓 Training Tokens** | **14.8T** | Diverse, high-quality training data | | **⏱️ Training Time** | **2.788M H800 GPU Hours** | Remarkably efficient training | | **🏆 MATH-500 Score** | **90.2%** | State-of-the-art mathematical reasoning |
--- ## 🌟 **Revolutionary Features** ``` 🚀 DeepSeek-V3 Architecture Overview │ ├── 🧠 Innovative Architecture │ ├── 🔄 Auxiliary-Loss-Free Load Balancing │ ├── 🎲 Multi-Token Prediction (MTP) │ └── 🏗️ Multi-Head Latent Attention │ ├── ⚡ Training Efficiency │ ├── 🔢 FP8 Mixed Precision Training │ ├── 📡 Computation-Communication Overlap │ └── 💎 Zero Loss Spikes/Rollbacks │ └── 🎯 Superior Performance ├── 🧮 Mathematics Excellence ├── 💻 Code Generation Mastery └── 🤔 Advanced Reasoning ``` --- ## 🏆 **Performance Benchmarks** ### 📚 **Academic Excellence**
| 🎯 **Benchmark** | 🥈 **DeepSeek-V2** | 🥉 **Qwen2.5 72B** | 🥉 **LLaMA3.1 405B** | 🥇 **DeepSeek-V3** | |:---|:---:|:---:|:---:|:---:| | **📖 MMLU (Accuracy)** | 78.4% | 85.0% | 84.4% | **🏆 87.1%** | | **🧮 MATH (Exact Match)** | 43.4% | 54.4% | 49.0% | **🏆 61.6%** | | **🧠 BBH (Exact Match)** | 78.8% | 79.8% | 82.9% | **🏆 87.5%** | | **📊 DROP (F1 Score)** | 80.4% | 80.6% | 86.0% | **🏆 89.0%** |
### 💻 **Code Generation Mastery**
| 🎯 **Benchmark** | 🥈 **DeepSeek-V2** | 🥉 **Qwen2.5 72B** | 🥉 **LLaMA3.1 405B** | 🥇 **DeepSeek-V3** | |:---|:---:|:---:|:---:|:---:| | **👨‍💻 HumanEval (Pass@1)** | 43.3% | 53.0% | 54.9% | **🏆 65.2%** | | **🔧 MBPP (Pass@1)** | 65.0% | 72.6% | 68.4% | **🏆 75.4%** | | **🏃‍♂️ LiveCodeBench (Pass@1)** | 11.6% | 12.9% | 15.5% | **🏆 19.4%** |
### 🎭 **Chat Model Excellence**
| 🎯 **Benchmark** | 🤖 **GPT-4o** | 🎭 **Claude-3.5-Sonnet** | 🦙 **LLaMA3.1 405B** | 🥇 **DeepSeek-V3** | |:---|:---:|:---:|:---:|:---:| | **🏟️ Arena-Hard** | 80.4 | 85.2 | 69.3 | **🏆 85.5** | | **🦙 AlpacaEval 2.0** | 51.1% | 52.0% | 40.5% | **🏆 70.0%** | | **📐 AIME 2024** | 9.3% | 16.0% | 23.3% | **🏆 39.2%** | | **🧮 MATH-500** | 74.6% | 78.3% | 73.8% | **🏆 90.2%** |
--- ## 📦 **Model Downloads**
### 🎯 **Choose Your Model** | 🤖 **Model** | 📊 **Parameters** | 🔗 **Download** | ⭐ **Use Case** | |:---|:---:|:---:|:---| | **🔬 DeepSeek-V3-Base** | 671B (37B active) | [![🤗 Download](https://img.shields.io/badge/🤗_Download-Base_Model-blue?style=for-the-badge)](https://huggingface.co/deepseek-ai/DeepSeek-V3-Base) | Research & Fine-tuning | | **💬 DeepSeek-V3-Chat** | 671B (37B active) | [![🤗 Download](https://img.shields.io/badge/🤗_Download-Chat_Model-green?style=for-the-badge)](https://huggingface.co/deepseek-ai/DeepSeek-V3) | Conversations & Applications |
--- ## 🌐 **Try Online**
[![🌐 Chat Interface](https://img.shields.io/badge/🌐_Try_DeepSeek_V3-Chat_Interface-blue?style=for-the-badge&color=00d4ff)](https://chat.deepseek.com/) [![🔌 API Platform](https://img.shields.io/badge/🔌_Developer_API-Platform-orange?style=for-the-badge&color=ff6b35)](https://platform.deepseek.com/) **💡 Experience the power of DeepSeek-V3 without any setup!**
--- ## 🚀 **Local Deployment Options** ### 🔥 **Recommended Frameworks**
| 🛠️ **Framework** | 💫 **Features** | 🎯 **Best For** | 📱 **Status** | |:---|:---|:---|:---:| | **🌊 SGLang** | MLA optimizations, FP8, Multi-node TP | **Production** | [![✅](https://img.shields.io/badge/✅-Ready-green)](#) | | **🚀 LMDeploy** | FP8/BF16, Cloud deployment | **Enterprise** | [![✅](https://img.shields.io/badge/✅-Ready-green)](#) | | **⚡ TensorRT-LLM** | INT4/8 quantization, NVIDIA optimization | **High Performance** | [![✅](https://img.shields.io/badge/✅-Ready-green)](#) | | **🌪️ vLLM** | Pipeline parallelism, Multi-GPU | **Scalability** | [![✅](https://img.shields.io/badge/✅-Ready-green)](#) | | **💡 LightLLM** | Multi-node, Mixed precision | **Flexibility** | [![✅](https://img.shields.io/badge/✅-Ready-green)](#) |
### 🖥️ **Hardware Support**
| 🔧 **Platform** | 💻 **Hardware** | 🎨 **Precision** | 📋 **Framework** | |:---|:---|:---|:---| | **🟢 NVIDIA GPUs** | H100, H800, A100 | FP8, BF16, INT4/8 | All frameworks | | **🔴 AMD GPUs** | MI300X, MI250X | FP8, BF16 | SGLang, vLLM | | **🟠 Huawei Ascend** | 910B NPUs | BF16, INT8 | MindIE |
--- ## ⚡ **Quick Start** ### 🐍 **1. Installation** ```bash # Clone the repository git clone https://github.com/deepseek-ai/DeepSeek-V3.git cd DeepSeek-V3/inference # Install dependencies pip install -r requirements.txt ``` ### 🔧 **2. Model Conversion** ```bash # Convert HuggingFace weights python convert.py \ --hf-ckpt-path /path/to/DeepSeek-V3 \ --save-path /path/to/DeepSeek-V3-Demo \ --n-experts 256 \ --model-parallel 16 ``` ### 🎯 **3. Run Inference** ```bash # Interactive chat torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR \ generate.py --ckpt-path /path/to/DeepSeek-V3-Demo \ --config configs/config_671B.json --interactive --temperature 0.7 # Batch processing torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR \ generate.py --ckpt-path /path/to/DeepSeek-V3-Demo \ --config configs/config_671B.json --input-file $FILE ``` --- ## 🏗️ **Architecture Deep Dive** ### 🧠 **Core Innovations** ``` ┌─────────────────────────────────────────────────────────────┐ │ 🚀 DeepSeek-V3 Architecture │ ├─────────────────────────────────────────────────────────────┤ │ 🔄 Auxiliary-Loss-Free Load Balancing │ │ ├── ⚖️ Minimizes performance degradation │ │ └── 🎯 Optimal expert utilization │ │ │ │ 🎲 Multi-Token Prediction (MTP) │ │ ├── 🚀 Enhanced model performance │ │ └── ⚡ Speculative decoding acceleration │ │ │ │ 🔢 FP8 Mixed Precision Training │ │ ├── 💎 First extreme-scale validation │ │ └── ⚡ Ultimate training efficiency │ │ │ │ 🧠 Knowledge Distillation from DeepSeek-R1 │ │ ├── 🔗 Long-Chain-of-Thought integration │ │ └── 🎯 Reasoning capability enhancement │ └─────────────────────────────────────────────────────────────┘ ``` ### 📈 **Training Efficiency**
| 🎯 **Metric** | 💎 **Achievement** | 🏆 **Industry Impact** | |:---|:---|:---| | **⏱️ Training Time** | 2.664M H800 GPU hours | **Most efficient 671B model** | | **📊 Data Volume** | 14.8T high-quality tokens | **Comprehensive knowledge base** | | **🎯 Stability** | Zero loss spikes/rollbacks | **Unprecedented training stability** | | **💰 Cost Efficiency** | Economical pre-training | **Accessible large-scale AI** |
--- ## 🎨 **Context Window Performance**
### 🔍 **Needle in a Haystack (NIAH) Results** ``` Context Length Performance ████████████████████████████████████████ 128K ✅ Perfect ██████████████████████████████████████ 96K ✅ Excellent ████████████████████████████████████ 64K ✅ Excellent ██████████████████████████████████ 32K ✅ Perfect ████████████████████████████ 16K ✅ Perfect ████████████████████ 8K ✅ Perfect ████████████ 4K ✅ Perfect ``` **🏆 DeepSeek-V3 maintains excellent performance across all context lengths up to 128K tokens**
--- ## 📄 **Research & Citation** ### 📚 **Technical Paper** [![📄 Read Paper](https://img.shields.io/badge/📄_Read_Paper-ArXiv_2412.19437-red?style=for-the-badge)](https://arxiv.org/pdf/2412.19437) ### 📖 **Citation** ```bibtex @misc{deepseekai2024deepseekv3technicalreport, title={DeepSeek-V3 Technical Report}, author={DeepSeek-AI}, year={2024}, eprint={2412.19437}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2412.19437}, } ``` --- ## 📜 **License & Usage**
[![Code License](https://img.shields.io/badge/Code_License-MIT-green?style=for-the-badge)](https://github.com/deepseek-ai/DeepSeek-V3/blob/main/LICENSE-CODE) [![Model License](https://img.shields.io/badge/Model_License-Commercial_Use_Supported-blue?style=for-the-badge)](https://github.com/deepseek-ai/DeepSeek-V3/blob/main/LICENSE-MODEL) **✅ Commercial use is fully supported for both Base and Chat models**
--- ## 🌟 **Community & Support**
### 🤝 **Join the Community** [![🌐 Homepage](https://img.shields.io/badge/🌐_Homepage-DeepSeek.com-blue?style=social)](https://www.deepseek.com/) [![💬 Discord](https://img.shields.io/badge/💬_Discord-Join_Chat-purple?style=social)](https://discord.gg/Tc7c45Zzu5) [![🐦 Twitter](https://img.shields.io/badge/🐦_Twitter-@deepseek__ai-blue?style=social)](https://twitter.com/deepseek_ai) [![📧 Email](https://img.shields.io/badge/📧_Email-service@deepseek.com-red?style=social)](mailto:service@deepseek.com) [![💬 WeChat](https://img.shields.io/badge/💬_WeChat-DeepSeek_AI-green?style=social)](https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true)
---
### 🚀 **Ready to Explore the Future?** **DeepSeek-V3 represents a leap forward in artificial intelligence, combining unprecedented scale with remarkable efficiency. Join thousands of researchers, developers, and innovators who are already building the future with DeepSeek-V3.** [![🌟 Star this repo](https://img.shields.io/github/stars/deepseek-ai/DeepSeek-V3?style=social)](https://github.com/deepseek-ai/DeepSeek-V3) [![👁️ Watch for updates](https://img.shields.io/github/watchers/deepseek-ai/DeepSeek-V3?style=social)](https://github.com/deepseek-ai/DeepSeek-V3) [![🍴 Fork and contribute](https://img.shields.io/github/forks/deepseek-ai/DeepSeek-V3?style=social)](https://github.com/deepseek-ai/DeepSeek-V3) --- **🎯 Built with ❤️ by DeepSeek-AI • Pushing the boundaries of artificial intelligence**