awesome-deepseek-integration/docs/curator/README_cn.md
Shreyas Pimpalgaonkar 1547c531a2 add curator
2025-01-27 11:11:52 -08:00

29 lines
1.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

![image](https://raw.githubusercontent.com/bespokelabsai/curator/main/docs/Bespoke-Labs-Logomark-Red-crop.png)
# [Curator](https://github.com/bespokelabsai/curator)
Curator 是一个用于后训练大型语言模型 (LLMs) 和结构化数据提取的制作与管理可扩展的数据集的开源工具。
Curator 被用来制作 [Bespoke-Stratos-17k](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k),这是一个用于训练完全开源的推理模型 [Bespoke-Stratos](https://www.bespokelabs.ai/blog/bespoke-stratos-the-unreasonable-effectiveness-of-reasoning-distillation) 的推理数据集。
### Curator 支持:
- 调用 Deepseek API 进行可扩展的合成数据管理
- 简便的结构化数据提取
- 缓存和自动恢复
- 数据集可视化
- 使用批处理模式节省费用
### 轻松使用 Curator 调用 Deepseek API
![image](https://pbs.twimg.com/media/GiLHb-xasAAbs4m?format=jpg&name=4096x4096)
# 从这里开始
- [Colab 示例](https://colab.research.google.com/drive/1Z78ciwHIl_ytACzcrslNrZP2iwK05eIF?usp=sharing)
- [Github 仓库](https://github.com/bespokelabsai/curator)
- [文档](https://docs.bespokelabs.ai/)
- [Discord](https://discord.com/invite/KqpXvpzVBS)