mirror of https://github.com/deepseek-ai/DeepSeek-V3.git synced 2025-07-04 23:41:37 -04:00

i have a feeling this might need to be mirrored

Go to file

Cyb3rHunter1337 52dd6321df Merge `7bc8640c9a` into `f6e34dd267`		2025-06-21 14:40:05 +00:00
.github	chore: add stale issue management configuration	2025-02-08 15:12:09 +08:00
figures	Release DeepSeek-V3	2024-12-26 19:01:57 +08:00
inference	Merge pull request #666 from codinglover222/deepseek-doc-fix	2025-04-09 09:50:40 +08:00
.gitignore	Enhance documentation and update .gitignore for model conversion scripts	2025-01-05 18:18:18 +00:00
LICENSE-CODE	Release DeepSeek-V3	2024-12-26 19:01:57 +08:00
LICENSE-MODEL	Release DeepSeek-V3	2024-12-26 19:01:57 +08:00
README_WEIGHTS.md	Release DeepSeek-V3	2024-12-26 19:01:57 +08:00
README.md	Update README.md	2025-06-21 20:31:55 +06:00

README.md

🚀 DeepSeek-V3: The Future of AI is Here

📊 Model at a Glance

🔥 Metric	💎 Value	🎯 Description
🧠 Total Parameters	671B	Massive scale for unprecedented capabilities
⚡ Activated Parameters	37B	Efficient MoE activation per token
📝 Context Length	128K	Extended context for complex tasks
🎓 Training Tokens	14.8T	Diverse, high-quality training data
⏱️ Training Time	2.788M H800 GPU Hours	Remarkably efficient training
🏆 MATH-500 Score	90.2%	State-of-the-art mathematical reasoning

🌟 Revolutionary Features

🚀 DeepSeek-V3 Architecture Overview
│
├── 🧠 Innovative Architecture
│   ├── 🔄 Auxiliary-Loss-Free Load Balancing
│   ├── 🎲 Multi-Token Prediction (MTP)
│   └── 🏗️ Multi-Head Latent Attention
│
├── ⚡ Training Efficiency  
│   ├── 🔢 FP8 Mixed Precision Training
│   ├── 📡 Computation-Communication Overlap
│   └── 💎 Zero Loss Spikes/Rollbacks
│
└── 🎯 Superior Performance
    ├── 🧮 Mathematics Excellence
    ├── 💻 Code Generation Mastery
    └── 🤔 Advanced Reasoning

🏆 Performance Benchmarks

📚 Academic Excellence

🎯 Benchmark	🥈 DeepSeek-V2	🥉 Qwen2.5 72B	🥉 LLaMA3.1 405B	🥇 DeepSeek-V3
📖 MMLU (Accuracy)	78.4%	85.0%	84.4%	🏆 87.1%
🧮 MATH (Exact Match)	43.4%	54.4%	49.0%	🏆 61.6%
🧠 BBH (Exact Match)	78.8%	79.8%	82.9%	🏆 87.5%
📊 DROP (F1 Score)	80.4%	80.6%	86.0%	🏆 89.0%

💻 Code Generation Mastery

🎯 Benchmark	🥈 DeepSeek-V2	🥉 Qwen2.5 72B	🥉 LLaMA3.1 405B	🥇 DeepSeek-V3
👨‍💻 HumanEval (Pass@1)	43.3%	53.0%	54.9%	🏆 65.2%
🔧 MBPP (Pass@1)	65.0%	72.6%	68.4%	🏆 75.4%
🏃‍♂️ LiveCodeBench (Pass@1)	11.6%	12.9%	15.5%	🏆 19.4%

🎭 Chat Model Excellence

🎯 Benchmark	🤖 GPT-4o	🎭 Claude-3.5-Sonnet	🦙 LLaMA3.1 405B	🥇 DeepSeek-V3
🏟️ Arena-Hard	80.4	85.2	69.3	🏆 85.5
🦙 AlpacaEval 2.0	51.1%	52.0%	40.5%	🏆 70.0%
📐 AIME 2024	9.3%	16.0%	23.3%	🏆 39.2%
🧮 MATH-500	74.6%	78.3%	73.8%	🏆 90.2%

📦 Model Downloads

🎯 Choose Your Model

🤖 Model	📊 Parameters	🔗 Download	⭐ Use Case
🔬 DeepSeek-V3-Base	671B (37B active)		Research & Fine-tuning
💬 DeepSeek-V3-Chat	671B (37B active)		Conversations & Applications

🌐 Try Online

💡 Experience the power of DeepSeek-V3 without any setup!

🚀 Local Deployment Options

🔥 Recommended Frameworks

🛠️ Framework	💫 Features	🎯 Best For
🌊 SGLang	MLA optimizations, FP8, Multi-node TP	Production
🚀 LMDeploy	FP8/BF16, Cloud deployment	Enterprise
⚡ TensorRT-LLM	INT4/8 quantization, NVIDIA optimization	High Performance
🌪️ vLLM	Pipeline parallelism, Multi-GPU	Scalability
💡 LightLLM	Multi-node, Mixed precision	Flexibility

🖥️ Hardware Support

🔧 Platform	💻 Hardware	🎨 Precision	📋 Framework
🟢 NVIDIA GPUs	H100, H800, A100	FP8, BF16, INT4/8	All frameworks
🔴 AMD GPUs	MI300X, MI250X	FP8, BF16	SGLang, vLLM
🟠 Huawei Ascend	910B NPUs	BF16, INT8	MindIE

⚡ Quick Start

🐍 1. Installation

# Clone the repository
git clone https://github.com/deepseek-ai/DeepSeek-V3.git
cd DeepSeek-V3/inference

# Install dependencies
pip install -r requirements.txt

🔧 2. Model Conversion

# Convert HuggingFace weights
python convert.py \
  --hf-ckpt-path /path/to/DeepSeek-V3 \
  --save-path /path/to/DeepSeek-V3-Demo \
  --n-experts 256 \
  --model-parallel 16

🎯 3. Run Inference

# Interactive chat
torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR \
  generate.py --ckpt-path /path/to/DeepSeek-V3-Demo \
  --config configs/config_671B.json --interactive --temperature 0.7

# Batch processing
torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR \
  generate.py --ckpt-path /path/to/DeepSeek-V3-Demo \
  --config configs/config_671B.json --input-file $FILE

🏗️ Architecture Deep Dive

🧠 Core Innovations

┌─────────────────────────────────────────────────────────────┐
│                   🚀 DeepSeek-V3 Architecture               │
├─────────────────────────────────────────────────────────────┤
│  🔄 Auxiliary-Loss-Free Load Balancing                     │
│   ├── ⚖️  Minimizes performance degradation                │
│   └── 🎯 Optimal expert utilization                        │
│                                                             │
│  🎲 Multi-Token Prediction (MTP)                           │
│   ├── 🚀 Enhanced model performance                        │
│   └── ⚡ Speculative decoding acceleration                 │
│                                                             │
│  🔢 FP8 Mixed Precision Training                           │
│   ├── 💎 First extreme-scale validation                    │
│   └── ⚡ Ultimate training efficiency                      │
│                                                             │
│  🧠 Knowledge Distillation from DeepSeek-R1               │
│   ├── 🔗 Long-Chain-of-Thought integration                │
│   └── 🎯 Reasoning capability enhancement                  │
└─────────────────────────────────────────────────────────────┘

📈 Training Efficiency

🎯 Metric	💎 Achievement	🏆 Industry Impact
⏱️ Training Time	2.664M H800 GPU hours	Most efficient 671B model
📊 Data Volume	14.8T high-quality tokens	Comprehensive knowledge base
🎯 Stability	Zero loss spikes/rollbacks	Unprecedented training stability
💰 Cost Efficiency	Economical pre-training	Accessible large-scale AI

🎨 Context Window Performance

🔍 Needle in a Haystack (NIAH) Results

Context Length Performance
████████████████████████████████████████ 128K ✅ Perfect
██████████████████████████████████████   96K  ✅ Excellent  
████████████████████████████████████     64K  ✅ Excellent
██████████████████████████████████       32K  ✅ Perfect
████████████████████████████             16K  ✅ Perfect
████████████████████                      8K  ✅ Perfect
████████████                              4K  ✅ Perfect

🏆 DeepSeek-V3 maintains excellent performance across all context lengths up to 128K tokens

📄 Research & Citation

📚 Technical Paper

📖 Citation

@misc{deepseekai2024deepseekv3technicalreport,
    title={DeepSeek-V3 Technical Report}, 
    author={DeepSeek-AI},
    year={2024},
    eprint={2412.19437},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2412.19437}, 
}

📜 License & Usage

✅ Commercial use is fully supported for both Base and Chat models

🌟 Community & Support

🤝 Join the Community

🚀 Ready to Explore the Future?

DeepSeek-V3 represents a leap forward in artificial intelligence, combining unprecedented scale with remarkable efficiency. Join thousands of researchers, developers, and innovators who are already building the future with DeepSeek-V3.

🎯 Built with ❤️ by DeepSeek-AI • Pushing the boundaries of artificial intelligence