mirror of
https://github.com/deepseek-ai/awesome-deepseek-integration.git
synced 2025-02-23 14:19:01 -05:00
1.1 KiB
1.1 KiB
Curator
Curator is an open-source tool to curate large scale datasets for post-training LLMs.
Curator was used to curate Bespoke-Stratos-17k, a reasoning dataset to train a fully open reasoning model Bespoke-Stratos.
Curator supports:
- Calling Deepseek API for scalable synthetic data curation
- Easy structured data extraction
- Caching and automatic recovery
- Dataset visualization
- Saving $
using batch mode