Update README.md to correct grammatical tense

Minor changes to correct grammatical tense for activities that took place in the past.
This commit is contained in:
Benjamin Winkler 2025-02-07 00:58:05 -05:00 committed by GitHub
parent 7ca5e1e7f7
commit 30f7c85c75
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -65,7 +65,7 @@ To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSe
**Post-Training: Large-Scale Reinforcement Learning on the Base Model**
- We directly apply reinforcement learning (RL) to the base model without relying on supervised fine-tuning (SFT) as a preliminary step. This approach allows the model to explore chain-of-thought (CoT) for solving complex problems, resulting in the development of DeepSeek-R1-Zero. DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT. This breakthrough paves the way for future advancements in this area.
- We directly applied reinforcement learning (RL) to the base model without relying on supervised fine-tuning (SFT) as a preliminary step. This approach allows the model to explore chain-of-thought (CoT) for solving complex problems, resulting in the development of DeepSeek-R1-Zero. DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT. This breakthrough paves the way for future advancements in this area.
- We introduce our pipeline to develop DeepSeek-R1. The pipeline incorporates two RL stages aimed at discovering improved reasoning patterns and aligning with human preferences, as well as two SFT stages that serve as the seed for the model's reasoning and non-reasoning capabilities.
We believe the pipeline will benefit the industry by creating better models.
@ -109,7 +109,7 @@ For more details regarding the model architecture, please refer to [DeepSeek-V3]
</div>
DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1.
We slightly change their configs and tokenizers. Please use our setting to run these models.
We slightly changed their configs and tokenizers. Please use our setting to run these models.
## 4. Evaluation Results