mirror of
https://github.com/deepseek-ai/DeepSeek-R1.git
synced 2025-05-02 00:19:06 -04:00
Update README.md
Fixed some spelling errors incorrect grammar
This commit is contained in:
parent
fdf883c014
commit
a7f72e1aea
18
README.md
18
README.md
@ -49,12 +49,12 @@
|
||||
## 1. Introduction
|
||||
|
||||
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1.
|
||||
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.
|
||||
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable reasoning performance.
|
||||
With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors.
|
||||
However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance,
|
||||
we introduce DeepSeek-R1, which incorporates cold-start data before RL.
|
||||
However, DeepSeek-R1-Zero encounters challenges like endless repetition, poor readability, and language mixing.
|
||||
We introduce DeepSeek-R1, which incorporates cold-start data before RL to address these issues and enhance reasoning performance.
|
||||
DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
|
||||
To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
|
||||
We have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen to support the research community. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
|
||||
|
||||
<p align="center">
|
||||
<img width="80%" src="figures/benchmark.jpg">
|
||||
@ -92,7 +92,7 @@ To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSe
|
||||
</div>
|
||||
|
||||
DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base.
|
||||
For more details regrading the model architecture, please refer to [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) repository.
|
||||
For more details regarding the model architecture, please refer to the [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) repository.
|
||||
|
||||
### DeepSeek-R1-Distill Models
|
||||
|
||||
@ -104,18 +104,18 @@ For more details regrading the model architecture, please refer to [DeepSeek-V3]
|
||||
| DeepSeek-R1-Distill-Qwen-7B | [Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B) | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) |
|
||||
| DeepSeek-R1-Distill-Llama-8B | [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) |
|
||||
| DeepSeek-R1-Distill-Qwen-14B | [Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B) |
|
||||
|DeepSeek-R1-Distill-Qwen-32B | [Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B) | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) |
|
||||
| DeepSeek-R1-Distill-Qwen-32B | [Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B) | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) |
|
||||
| DeepSeek-R1-Distill-Llama-70B | [Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) |
|
||||
|
||||
</div>
|
||||
|
||||
DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1.
|
||||
We slightly change their configs and tokenizers. Please use our setting to run these models.
|
||||
DeepSeek-R1-Distill models are fine-tuned based on open-source models using samples generated by DeepSeek-R1.
|
||||
We slightly change their configs and tokenizers. Please use our settings to run these models.
|
||||
|
||||
## 4. Evaluation Results
|
||||
|
||||
### DeepSeek-R1-Evaluation
|
||||
For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 64 responses per query to estimate pass@1.
|
||||
The maximum generation length for all our models is 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 64 responses per query to estimate pass@1.
|
||||
<div align="center">
|
||||
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user