Fixes#5
Correct typos and improve phrasing in `README.md`.
* Change "reinforcement learning (RL)" to "RL" after the first mention.
* Change "to generate 800K data" to "to generate 800K data samples".
* Change "reject sampling" to "rejection sampling".
* Correct the section header to "Conclusion, Limitations, and Future Work".
* Correct the typo "parapraphs" to "paragraphs".
---
For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/deepseek-ai/DeepSeek-R1/issues/5?shareId=XXXX-XXXX-XXXX-XXXX).