mirror of
https://github.com/deepseek-ai/awesome-deepseek-integration.git
synced 2025-02-23 06:09:02 -05:00
add curator
This commit is contained in:
parent
bd3ef90cc1
commit
1547c531a2
12
README.md
12
README.md
@ -160,6 +160,18 @@ English/[简体中文](https://github.com/deepseek-ai/awesome-deepseek-integrati
|
|||||||
</tr>
|
</tr>
|
||||||
</table>
|
</table>
|
||||||
|
|
||||||
|
|
||||||
|
### Synthetic data curation
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<tr>
|
||||||
|
<td> <img src="https://raw.githubusercontent.com/bespokelabsai/curator/main/docs/Bespoke-Labs-Logomark-Red-crop.png" alt="Icon" width="64" height="auto" /> </td>
|
||||||
|
<td> <a href="https://github.com/deepseek-ai/awesome-deepseek-integration/blob/main/docs/curator/README.md"> Curator </a> </td>
|
||||||
|
<td> An open-source tool to curate large scale datasets for post-training LLMs. </td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
|
||||||
### IM Application Plugins
|
### IM Application Plugins
|
||||||
|
|
||||||
<table>
|
<table>
|
||||||
|
30
docs/curator/README.md
Normal file
30
docs/curator/README.md
Normal file
@ -0,0 +1,30 @@
|
|||||||
|
|
||||||
|
data:image/s3,"s3://crabby-images/c6bca/c6bca9ab37c9faec9cbf7c9646fa210807740bf1" alt="image"
|
||||||
|
|
||||||
|
|
||||||
|
# [Curator](https://github.com/bespokelabsai/curator)
|
||||||
|
|
||||||
|
|
||||||
|
Curator is an open-source tool to curate large scale datasets for post-training LLMs.
|
||||||
|
|
||||||
|
Curator was used to curate [Bespoke-Stratos-17k](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k), a reasoning dataset to train a fully open reasoning model [Bespoke-Stratos](https://www.bespokelabs.ai/blog/bespoke-stratos-the-unreasonable-effectiveness-of-reasoning-distillation).
|
||||||
|
|
||||||
|
|
||||||
|
### Curator supports:
|
||||||
|
|
||||||
|
- Calling Deepseek API for scalable synthetic data curation
|
||||||
|
- Easy structured data extraction
|
||||||
|
- Caching and automatic recovery
|
||||||
|
- Dataset visualization
|
||||||
|
- Saving $$$ using batch mode
|
||||||
|
|
||||||
|
### Call Deepseek API with Curator easily:
|
||||||
|
|
||||||
|
data:image/s3,"s3://crabby-images/407da/407dae3b2553bf4b984a124a27cf8c80dda211df" alt="image"
|
||||||
|
|
||||||
|
# Get Started here
|
||||||
|
|
||||||
|
- [Colab Example](https://colab.research.google.com/drive/1Z78ciwHIl_ytACzcrslNrZP2iwK05eIF?usp=sharing)
|
||||||
|
- [Github Repo](https://github.com/bespokelabsai/curator)
|
||||||
|
- [Documentation](https://docs.bespokelabs.ai/)
|
||||||
|
- [Discord](https://discord.com/invite/KqpXvpzVBS)
|
29
docs/curator/README_cn.md
Normal file
29
docs/curator/README_cn.md
Normal file
@ -0,0 +1,29 @@
|
|||||||
|
data:image/s3,"s3://crabby-images/c6bca/c6bca9ab37c9faec9cbf7c9646fa210807740bf1" alt="image"
|
||||||
|
|
||||||
|
|
||||||
|
# [Curator](https://github.com/bespokelabsai/curator)
|
||||||
|
|
||||||
|
|
||||||
|
Curator 是一个用于后训练大型语言模型 (LLMs) 和结构化数据提取的制作与管理可扩展的数据集的开源工具。
|
||||||
|
|
||||||
|
Curator 被用来制作 [Bespoke-Stratos-17k](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k),这是一个用于训练完全开源的推理模型 [Bespoke-Stratos](https://www.bespokelabs.ai/blog/bespoke-stratos-the-unreasonable-effectiveness-of-reasoning-distillation) 的推理数据集。
|
||||||
|
|
||||||
|
|
||||||
|
### Curator 支持:
|
||||||
|
|
||||||
|
- 调用 Deepseek API 进行可扩展的合成数据管理
|
||||||
|
- 简便的结构化数据提取
|
||||||
|
- 缓存和自动恢复
|
||||||
|
- 数据集可视化
|
||||||
|
- 使用批处理模式节省费用
|
||||||
|
|
||||||
|
### 轻松使用 Curator 调用 Deepseek API:
|
||||||
|
|
||||||
|
data:image/s3,"s3://crabby-images/407da/407dae3b2553bf4b984a124a27cf8c80dda211df" alt="image"
|
||||||
|
|
||||||
|
# 从这里开始
|
||||||
|
|
||||||
|
- [Colab 示例](https://colab.research.google.com/drive/1Z78ciwHIl_ytACzcrslNrZP2iwK05eIF?usp=sharing)
|
||||||
|
- [Github 仓库](https://github.com/bespokelabsai/curator)
|
||||||
|
- [文档](https://docs.bespokelabs.ai/)
|
||||||
|
- [Discord](https://discord.com/invite/KqpXvpzVBS)
|
Loading…
Reference in New Issue
Block a user