Update README.md

Fixed some spelling errors incorrect grammar
This commit is contained in:
Ishan Karpe 2025-01-21 11:07:44 -08:00 committed by GitHub
parent fdf883c014
commit a7f72e1aea
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -49,12 +49,12 @@
## 1. Introduction
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1.
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable reasoning performance.
With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors.
However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance,
we introduce DeepSeek-R1, which incorporates cold-start data before RL.
However, DeepSeek-R1-Zero encounters challenges like endless repetition, poor readability, and language mixing.
We introduce DeepSeek-R1, which incorporates cold-start data before RL to address these issues and enhance reasoning performance.
DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
We have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen to support the research community. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
<p align="center">
<img width="80%" src="figures/benchmark.jpg">
@ -92,7 +92,7 @@ To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSe
</div>
DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base.
For more details regrading the model architecture, please refer to [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) repository.
For more details regarding the model architecture, please refer to the [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) repository.
### DeepSeek-R1-Distill Models
@ -109,13 +109,13 @@ For more details regrading the model architecture, please refer to [DeepSeek-V3]
</div>
DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1.
We slightly change their configs and tokenizers. Please use our setting to run these models.
DeepSeek-R1-Distill models are fine-tuned based on open-source models using samples generated by DeepSeek-R1.
We slightly change their configs and tokenizers. Please use our settings to run these models.
## 4. Evaluation Results
### DeepSeek-R1-Evaluation
For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 64 responses per query to estimate pass@1.
The maximum generation length for all our models is 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 64 responses per query to estimate pass@1.
<div align="center">