mirror of
https://github.com/deepseek-ai/DeepSeek-Coder.git
synced 2025-02-22 21:59:11 -05:00
In the README.md the sample dataset format mentions 'instruction' and 'output' fields, but an example JSON line would be helpful.
This commit is contained in:
parent
b7ba565956
commit
ec0774bc3e
14
README.md
14
README.md
@ -282,6 +282,20 @@ pip install -r finetune/requirements.txt
|
||||
|
||||
Please follow [Sample Dataset Format](https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1) to prepare your training data.
|
||||
Each line is a json-serialized string with two required fields `instruction` and `output`.
|
||||
Example of a JSON-serialized string, one formatted for use in Python and another for use in SQL.
|
||||
|
||||
###Python##
|
||||
{
|
||||
"instruction": "Write a Python function to calculate factorial",
|
||||
"output": "def factorial(n):\n if n == 0:\n return 1\n else:\n return n * factorial(n-1)"
|
||||
}
|
||||
|
||||
###SQL###
|
||||
{
|
||||
"instruction": "Create a SQL query to find duplicate emails",
|
||||
"output": "SELECT email FROM users GROUP BY email HAVING COUNT(*) > 1;"
|
||||
}
|
||||
|
||||
|
||||
After data preparation, you can use the sample shell script to finetune `deepseek-ai/deepseek-coder-6.7b-instruct`.
|
||||
Remember to specify `DATA_PATH`, `OUTPUT_PATH`.
|
||||
|
Loading…
Reference in New Issue
Block a user