mirror of https://github.com/deepseek-ai/DeepSeek-R1.git synced 2025-06-14 21:53:56 -04:00

Go to file

Gyan Prabhat 7131a6af41 Improved Clarity		2025-01-22 20:17:46 +05:30
figures	Release DeepSeek-R1	2025-01-20 20:19:44 +08:00
DeepSeek_R1.pdf	Release DeepSeek-R1	2025-01-20 20:19:44 +08:00
LICENSE	Release DeepSeek-R1	2025-01-20 20:19:44 +08:00
README.md	Improved Clarity	2025-01-22 20:17:46 +05:30

README.md

DeepSeek-R1

Paper Link👁️

1. Introduction

We introduce our first-generation reasoning models: DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without preliminary supervised fine-tuning (SFT), demonstrates remarkable reasoning performance. Through RL training, it naturally developed numerous powerful and intriguing reasoning behaviors. However, DeepSeek-R1-Zero faces challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning capabilities, we developed DeepSeek-R1, which incorporates cold-start data prior to RL training. DeepSeek-R1 achieves performance comparable to OpenAI-o1 in math, coding, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (based on Llama and Qwen architectures) distilled from DeepSeek-R1. Notably, DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across benchmarks, achieving new state-of-the-art results for dense models.

2. Model Summary

Post-Training: Large-Scale Reinforcement Learning on the Base Model

We directly apply reinforcement learning (RL) to the base model without supervised fine-tuning (SFT) as a preliminary step. This approach enables the model to explore chain-of-thought (CoT) reasoning for solving complex problems, leading to the development of DeepSeek-R1-Zero. The model demonstrates capabilities such as self-verification, reflection, and the generation of long CoTs, marking a significant milestone for the research community. Notably, this is the first open research initiative to validate that large language models (LLMs) can develop reasoning capabilities purely through RL, eliminating the need for SFT. This breakthrough paves the way for future advancements in the field.
We introduce our pipeline for developing DeepSeek-R1, which incorporates two RL stages (aimed at discovering improved reasoning patterns and aligning with human preferences) and two SFT stages (serving as the foundation for the model’s reasoning and non-reasoning capabilities). We believe this pipeline will benefit the industry by enabling the creation of more advanced models.

Distillation: Smaller Models Can Be Powerful Too

We demonstrate that reasoning patterns from larger models can be distilled into smaller ones, achieving superior performance compared to reasoning patterns discovered through RL on small models. The open-source DeepSeek-R1 and its API will empower the research community to distill more capable smaller models in the future.
Using reasoning data generated by DeepSeek-R1, we fine-tuned several dense models widely adopted in the research community. Evaluations show that these smaller distilled dense models excel on benchmarks. We have open-sourced distilled checkpoints (1.5B, 7B, 8B, 14B, 32B, and 70B) based on the Qwen2.5 and Llama3 architectures for community use.

3. Model Downloads

DeepSeek-R1 Models

Model	#Total Params	#Activated Params	Context Length	Download
DeepSeek-R1-Zero	671B	37B	128K	🤗 HuggingFace
DeepSeek-R1	671B	37B	128K	🤗 HuggingFace

DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base. For more details regrading the model architecture, please refer to DeepSeek-V3 repository.

DeepSeek-R1-Distill Models

Model	Base Model	Download
DeepSeek-R1-Distill-Qwen-1.5B	Qwen2.5-Math-1.5B	🤗 HuggingFace
DeepSeek-R1-Distill-Qwen-7B	Qwen2.5-Math-7B	🤗 HuggingFace
DeepSeek-R1-Distill-Llama-8B	Llama-3.1-8B	🤗 HuggingFace
DeepSeek-R1-Distill-Qwen-14B	Qwen2.5-14B	🤗 HuggingFace
DeepSeek-R1-Distill-Qwen-32B	Qwen2.5-32B	🤗 HuggingFace
DeepSeek-R1-Distill-Llama-70B	Llama-3.3-70B-Instruct	🤗 HuggingFace

DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1. We slightly change their configs and tokenizers. Please use our setting to run these models.

4. Evaluation Results

DeepSeek-R1-Evaluation

For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of 0.6, a top-p value of 0.95, and generate 64 responses per query to estimate pass@1.

Category	Benchmark (Metric)	Claude-3.5-Sonnet-1022	GPT-4o 0513	DeepSeek V3	OpenAI o1-mini	OpenAI o1-1217	DeepSeek R1
	Architecture	-	-	MoE	-	-	MoE
	# Activated Params	-	-	37B	-	-	37B
	# Total Params	-	-	671B	-	-	671B
English	MMLU (Pass@1)	88.3	87.2	88.5	85.2	91.8	90.8
	MMLU-Redux (EM)	88.9	88.0	89.1	86.7	-	92.9
	MMLU-Pro (EM)	78.0	72.6	75.9	80.3	-	84.0
	DROP (3-shot F1)	88.3	83.7	91.6	83.9	90.2	92.2
	IF-Eval (Prompt Strict)	86.5	84.3	86.1	84.8	-	83.3
	GPQA-Diamond (Pass@1)	65.0	49.9	59.1	60.0	75.7	71.5
	SimpleQA (Correct)	28.4	38.2	24.9	7.0	47.0	30.1
	FRAMES (Acc.)	72.5	80.5	73.3	76.9	-	82.5
	AlpacaEval2.0 (LC-winrate)	52.0	51.1	70.0	57.8	-	87.6
	ArenaHard (GPT-4-1106)	85.2	80.4	85.5	92.0	-	92.3
Code	LiveCodeBench (Pass@1-COT)	33.8	34.2	-	53.8	63.4	65.9
	Codeforces (Percentile)	20.3	23.6	58.7	93.4	96.6	96.3
	Codeforces (Rating)	717	759	1134	1820	2061	2029
	SWE Verified (Resolved)	50.8	38.8	42.0	41.6	48.9	49.2
	Aider-Polyglot (Acc.)	45.3	16.0	49.6	32.9	61.7	53.3
Math	AIME 2024 (Pass@1)	16.0	9.3	39.2	63.6	79.2	79.8
	MATH-500 (Pass@1)	78.3	74.6	90.2	90.0	96.4	97.3
	CNMO 2024 (Pass@1)	13.1	10.8	43.2	67.6	-	78.8
Chinese	CLUEWSC (EM)	85.4	87.9	90.9	89.9	-	92.8
	C-Eval (EM)	76.7	76.0	86.5	68.9	-	91.8
	C-SimpleQA (Correct)	55.4	58.7	68.0	40.3	-	63.7

Distilled Model Evaluation

Model	AIME 2024 pass@1	AIME 2024 cons@64	MATH-500 pass@1	GPQA Diamond pass@1	LiveCodeBench pass@1	CodeForces rating
GPT-4o-0513	9.3	13.4	74.6	49.9	32.9	759
Claude-3.5-Sonnet-1022	16.0	26.7	78.3	65.0	38.9	717
o1-mini	63.6	80.0	90.0	60.0	53.8	1820
QwQ-32B-Preview	44.0	60.0	90.6	54.5	41.9	1316
DeepSeek-R1-Distill-Qwen-1.5B	28.9	52.7	83.9	33.8	16.9	954
DeepSeek-R1-Distill-Qwen-7B	55.5	83.3	92.8	49.1	37.6	1189
DeepSeek-R1-Distill-Qwen-14B	69.7	80.0	93.9	59.1	53.1	1481
DeepSeek-R1-Distill-Qwen-32B	72.6	83.3	94.3	62.1	57.2	1691
DeepSeek-R1-Distill-Llama-8B	50.4	80.0	89.1	49.0	39.6	1205
DeepSeek-R1-Distill-Llama-70B	70.0	86.7	94.5	65.2	57.5	1633

5. Chat Website & API Platform

You can chat with DeepSeek-R1 on DeepSeek's official website: chat.deepseek.com, and switch on the button "DeepThink"

We also provide OpenAI-Compatible API at DeepSeek Platform: platform.deepseek.com

6. How to Run Locally

DeepSeek-R1 Models

Please visit DeepSeek-V3 repo for more information about running DeepSeek-R1 locally.

DeepSeek-R1-Distill Models

DeepSeek-R1-Distill models can be utilized in the same manner as Qwen or Llama models.

For instance, you can easily start a service using vLLM:

vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager

You can also easily start a service using SGLang

python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --trust-remote-code --tp 2

Usage Recommendations

We recommend adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:

Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
Avoid adding a system prompt; all instructions should be contained within the user prompt.
For mathematical problems, it is advisable to include a directive in your prompt such as: "put your final answer within \boxed{}".
When evaluating model performance, it is recommended to conduct multiple tests and average the results.

7. License

This code repository and the model weights are licensed under the MIT License. DeepSeek-R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that:

DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Qwen-14B and DeepSeek-R1-Distill-Qwen-32B are derived from Qwen-2.5 series, which are originally licensed under Apache 2.0 License, and now finetuned with 800k samples curated with DeepSeek-R1.
DeepSeek-R1-Distill-Llama-8B is derived from Llama3.1-8B-Base and is originally licensed under llama3.1 license.
DeepSeek-R1-Distill-Llama-70B is derived from Llama3.3-70B-Instruct and is originally licensed under llama3.3 license.

8. Citation

9. Contact

If you have any questions, please raise an issue or contact us at service@deepseek.com.

README.md Unescape Escape

DeepSeek-R1

1. Introduction

2. Model Summary

3. Model Downloads

DeepSeek-R1 Models

DeepSeek-R1-Distill Models

4. Evaluation Results

DeepSeek-R1-Evaluation

Distilled Model Evaluation

5. Chat Website & API Platform

6. How to Run Locally

DeepSeek-R1 Models

DeepSeek-R1-Distill Models

Usage Recommendations

7. License

8. Citation

9. Contact

README.md