diff --git a/README.md b/README.md index 09d2bda..2b7aff3 100644 --- a/README.md +++ b/README.md @@ -49,12 +49,12 @@ ## 1. Introduction We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. -DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. +DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable reasoning performance. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. -However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, -we introduce DeepSeek-R1, which incorporates cold-start data before RL. +However, DeepSeek-R1-Zero encounters challenges like endless repetition, poor readability, and language mixing. +We introduce DeepSeek-R1, which incorporates cold-start data before RL to address these issues and enhance reasoning performance. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. -To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. +We have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen to support the research community. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.

@@ -92,7 +92,7 @@ To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSe DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base. -For more details regrading the model architecture, please refer to [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) repository. +For more details regarding the model architecture, please refer to the [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) repository. ### DeepSeek-R1-Distill Models @@ -104,18 +104,18 @@ For more details regrading the model architecture, please refer to [DeepSeek-V3] | DeepSeek-R1-Distill-Qwen-7B | [Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B) | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) | | DeepSeek-R1-Distill-Llama-8B | [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) | | DeepSeek-R1-Distill-Qwen-14B | [Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B) | -|DeepSeek-R1-Distill-Qwen-32B | [Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B) | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | +| DeepSeek-R1-Distill-Qwen-32B | [Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B) | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | | DeepSeek-R1-Distill-Llama-70B | [Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) | -DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1. -We slightly change their configs and tokenizers. Please use our setting to run these models. +DeepSeek-R1-Distill models are fine-tuned based on open-source models using samples generated by DeepSeek-R1. +We slightly change their configs and tokenizers. Please use our settings to run these models. ## 4. Evaluation Results ### DeepSeek-R1-Evaluation - For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 64 responses per query to estimate pass@1. + The maximum generation length for all our models is 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 64 responses per query to estimate pass@1.