i have a feeling this might need to be mirrored
Go to file
2025-06-21 14:40:05 +00:00
.github chore: add stale issue management configuration 2025-02-08 15:12:09 +08:00
figures Release DeepSeek-V3 2024-12-26 19:01:57 +08:00
inference Merge pull request #666 from codinglover222/deepseek-doc-fix 2025-04-09 09:50:40 +08:00
.gitignore Enhance documentation and update .gitignore for model conversion scripts 2025-01-05 18:18:18 +00:00
LICENSE-CODE Release DeepSeek-V3 2024-12-26 19:01:57 +08:00
LICENSE-MODEL Release DeepSeek-V3 2024-12-26 19:01:57 +08:00
README_WEIGHTS.md Release DeepSeek-V3 2024-12-26 19:01:57 +08:00
README.md Update README.md 2025-06-21 20:31:55 +06:00

🚀 DeepSeek-V3: The Future of AI is Here

DeepSeek-V3

Homepage Chat Hugging Face Discord Paper


📊 Model at a Glance

🔥 Metric 💎 Value 🎯 Description
🧠 Total Parameters 671B Massive scale for unprecedented capabilities
Activated Parameters 37B Efficient MoE activation per token
📝 Context Length 128K Extended context for complex tasks
🎓 Training Tokens 14.8T Diverse, high-quality training data
⏱️ Training Time 2.788M H800 GPU Hours Remarkably efficient training
🏆 MATH-500 Score 90.2% State-of-the-art mathematical reasoning

🌟 Revolutionary Features

🚀 DeepSeek-V3 Architecture Overview
│
├── 🧠 Innovative Architecture
│   ├── 🔄 Auxiliary-Loss-Free Load Balancing
│   ├── 🎲 Multi-Token Prediction (MTP)
│   └── 🏗️ Multi-Head Latent Attention
│
├── ⚡ Training Efficiency  
│   ├── 🔢 FP8 Mixed Precision Training
│   ├── 📡 Computation-Communication Overlap
│   └── 💎 Zero Loss Spikes/Rollbacks
│
└── 🎯 Superior Performance
    ├── 🧮 Mathematics Excellence
    ├── 💻 Code Generation Mastery
    └── 🤔 Advanced Reasoning

🏆 Performance Benchmarks

📚 Academic Excellence

🎯 Benchmark 🥈 DeepSeek-V2 🥉 Qwen2.5 72B 🥉 LLaMA3.1 405B 🥇 DeepSeek-V3
📖 MMLU (Accuracy) 78.4% 85.0% 84.4% 🏆 87.1%
🧮 MATH (Exact Match) 43.4% 54.4% 49.0% 🏆 61.6%
🧠 BBH (Exact Match) 78.8% 79.8% 82.9% 🏆 87.5%
📊 DROP (F1 Score) 80.4% 80.6% 86.0% 🏆 89.0%

💻 Code Generation Mastery

🎯 Benchmark 🥈 DeepSeek-V2 🥉 Qwen2.5 72B 🥉 LLaMA3.1 405B 🥇 DeepSeek-V3
👨‍💻 HumanEval (Pass@1) 43.3% 53.0% 54.9% 🏆 65.2%
🔧 MBPP (Pass@1) 65.0% 72.6% 68.4% 🏆 75.4%
🏃‍♂️ LiveCodeBench (Pass@1) 11.6% 12.9% 15.5% 🏆 19.4%

🎭 Chat Model Excellence

🎯 Benchmark 🤖 GPT-4o 🎭 Claude-3.5-Sonnet 🦙 LLaMA3.1 405B 🥇 DeepSeek-V3
🏟️ Arena-Hard 80.4 85.2 69.3 🏆 85.5
🦙 AlpacaEval 2.0 51.1% 52.0% 40.5% 🏆 70.0%
📐 AIME 2024 9.3% 16.0% 23.3% 🏆 39.2%
🧮 MATH-500 74.6% 78.3% 73.8% 🏆 90.2%

📦 Model Downloads

🎯 Choose Your Model

🤖 Model 📊 Parameters 🔗 Download Use Case
🔬 DeepSeek-V3-Base 671B (37B active) 🤗 Download Research & Fine-tuning
💬 DeepSeek-V3-Chat 671B (37B active) 🤗 Download Conversations & Applications

🌐 Try Online

🌐 Chat Interface 🔌 API Platform

💡 Experience the power of DeepSeek-V3 without any setup!


🚀 Local Deployment Options

🛠️ Framework 💫 Features 🎯 Best For 📱 Status
🌊 SGLang MLA optimizations, FP8, Multi-node TP Production ✅
🚀 LMDeploy FP8/BF16, Cloud deployment Enterprise ✅
TensorRT-LLM INT4/8 quantization, NVIDIA optimization High Performance ✅
🌪️ vLLM Pipeline parallelism, Multi-GPU Scalability ✅
💡 LightLLM Multi-node, Mixed precision Flexibility ✅

🖥️ Hardware Support

🔧 Platform 💻 Hardware 🎨 Precision 📋 Framework
🟢 NVIDIA GPUs H100, H800, A100 FP8, BF16, INT4/8 All frameworks
🔴 AMD GPUs MI300X, MI250X FP8, BF16 SGLang, vLLM
🟠 Huawei Ascend 910B NPUs BF16, INT8 MindIE

Quick Start

🐍 1. Installation

# Clone the repository
git clone https://github.com/deepseek-ai/DeepSeek-V3.git
cd DeepSeek-V3/inference

# Install dependencies
pip install -r requirements.txt

🔧 2. Model Conversion

# Convert HuggingFace weights
python convert.py \
  --hf-ckpt-path /path/to/DeepSeek-V3 \
  --save-path /path/to/DeepSeek-V3-Demo \
  --n-experts 256 \
  --model-parallel 16

🎯 3. Run Inference

# Interactive chat
torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR \
  generate.py --ckpt-path /path/to/DeepSeek-V3-Demo \
  --config configs/config_671B.json --interactive --temperature 0.7

# Batch processing
torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR \
  generate.py --ckpt-path /path/to/DeepSeek-V3-Demo \
  --config configs/config_671B.json --input-file $FILE

🏗️ Architecture Deep Dive

🧠 Core Innovations

┌─────────────────────────────────────────────────────────────┐
│                   🚀 DeepSeek-V3 Architecture               │
├─────────────────────────────────────────────────────────────┤
│  🔄 Auxiliary-Loss-Free Load Balancing                     │
│   ├── ⚖️  Minimizes performance degradation                │
│   └── 🎯 Optimal expert utilization                        │
│                                                             │
│  🎲 Multi-Token Prediction (MTP)                           │
│   ├── 🚀 Enhanced model performance                        │
│   └── ⚡ Speculative decoding acceleration                 │
│                                                             │
│  🔢 FP8 Mixed Precision Training                           │
│   ├── 💎 First extreme-scale validation                    │
│   └── ⚡ Ultimate training efficiency                      │
│                                                             │
│  🧠 Knowledge Distillation from DeepSeek-R1               │
│   ├── 🔗 Long-Chain-of-Thought integration                │
│   └── 🎯 Reasoning capability enhancement                  │
└─────────────────────────────────────────────────────────────┘

📈 Training Efficiency

🎯 Metric 💎 Achievement 🏆 Industry Impact
⏱️ Training Time 2.664M H800 GPU hours Most efficient 671B model
📊 Data Volume 14.8T high-quality tokens Comprehensive knowledge base
🎯 Stability Zero loss spikes/rollbacks Unprecedented training stability
💰 Cost Efficiency Economical pre-training Accessible large-scale AI

🎨 Context Window Performance

🔍 Needle in a Haystack (NIAH) Results

Context Length Performance
████████████████████████████████████████ 128K ✅ Perfect
██████████████████████████████████████   96K  ✅ Excellent  
████████████████████████████████████     64K  ✅ Excellent
██████████████████████████████████       32K  ✅ Perfect
████████████████████████████             16K  ✅ Perfect
████████████████████                      8K  ✅ Perfect
████████████                              4K  ✅ Perfect

🏆 DeepSeek-V3 maintains excellent performance across all context lengths up to 128K tokens


📄 Research & Citation

📚 Technical Paper

📄 Read Paper

📖 Citation

@misc{deepseekai2024deepseekv3technicalreport,
    title={DeepSeek-V3 Technical Report}, 
    author={DeepSeek-AI},
    year={2024},
    eprint={2412.19437},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2412.19437}, 
}

📜 License & Usage

Code License Model License

Commercial use is fully supported for both Base and Chat models


🌟 Community & Support

🤝 Join the Community

🌐 Homepage 💬 Discord 🐦 Twitter 📧 Email 💬 WeChat


🚀 Ready to Explore the Future?

DeepSeek-V3 represents a leap forward in artificial intelligence, combining unprecedented scale with remarkable efficiency. Join thousands of researchers, developers, and innovators who are already building the future with DeepSeek-V3.

🌟 Star this repo 👁️ Watch for updates 🍴 Fork and contribute


🎯 Built with ❤️ by DeepSeek-AI • Pushing the boundaries of artificial intelligence