mirror of
https://github.com/deepseek-ai/DeepSeek-R1.git
synced 2025-04-18 09:38:59 -04:00
Update README.md
Behaviors are exhibited rather than emerged.
This commit is contained in:
parent
bb10d07b27
commit
1c10c9f677
@ -47,7 +47,7 @@
|
||||
|
||||
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1.
|
||||
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.
|
||||
With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors.
|
||||
With RL, DeepSeek-R1-Zero naturally exhibited numerous powerful and interesting reasoning behaviors.
|
||||
However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance,
|
||||
we introduce DeepSeek-R1, which incorporates cold-start data before RL.
|
||||
DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
|
||||
|
Loading…
Reference in New Issue
Block a user