mirror of
https://github.com/deepseek-ai/DeepSeek-Coder.git
synced 2025-02-23 14:19:09 -05:00
63 lines
1.9 KiB
Markdown
63 lines
1.9 KiB
Markdown
## 1. Introduction
|
|
|
|
We provide a test script to evaluate the performance of the **deepseek-coder** model on code generation benchmarks with 3-shot setting, **[MBPP]**(https://huggingface.co/datasets/mbpp).
|
|
|
|
|
|
|
|
## 2. Setup
|
|
|
|
```
|
|
pip install accelerate
|
|
pip install attrdict
|
|
pip install transformers
|
|
pip install pytorch
|
|
```
|
|
|
|
|
|
|
|
## 3. Evaluation
|
|
|
|
We've created a sample script, **eval.sh**, that demonstrates how to test the **deepseek-coder-1.3b-base** model on the MBPP dataset leveraging **8** GPUs.
|
|
|
|
```bash
|
|
MODEL_NAME_OR_PATH="deepseek-ai/deepseek-coder-1.3b-base"
|
|
DATASET_ROOT="data/"
|
|
LANGUAGE="python"
|
|
python -m accelerate.commands.launch --config_file test_config.yaml eval_pal.py --logdir ${MODEL_NAME_OR_PATH} --dataroot ${DATASET_ROOT}
|
|
```
|
|
|
|
## 4. Experimental Results
|
|
|
|
We report experimental results here for several models. We set the maximum input length to **4096** and the maximum output length to **500**, and employ the **greedy search strategy**.
|
|
|
|
|
|
|
|
#### (1) Multilingual Base Models
|
|
|
|
| Model | Size | Pass@1 |
|
|
|-------------------|------|--------|
|
|
| CodeShell | 7B | 38.6% |
|
|
| CodeGeeX2 | 6B | 36.2% |
|
|
| StarCoder | 16B | 42.8% |
|
|
| CodeLLama-Base | 7B | 38.6% |
|
|
| CodeLLama-Base | 13B | 47.0% |
|
|
| CodeLLama-Base | 34B | 55.0% |
|
|
| | | | | | | | | | | |
|
|
| DeepSeek-Coder-Base| 1.3B | 46.8% |
|
|
| DeepSeek-Coder-Base| 5.7B | 57.2% |
|
|
| DeepSeek-Coder-Base| 6.7B | 60.6% |
|
|
| DeepSeek-Coder-Base|33B | **66.0%** |
|
|
|
|
#### (2) Instruction-Tuned Models
|
|
| Model | Size | Pass@1 |
|
|
|---------------------|------|--------|
|
|
| GPT-3.5-Turbo | - | 70.8% |
|
|
| GPT-4 | - | **80.0%** |
|
|
| | | | | | | | | | | |
|
|
| DeepSeek-Coder-Instruct | 1.3B | 49.4% |
|
|
| DeepSeek-Coder-Instruct | 5.7B | 62.4% |
|
|
| DeepSeek-Coder-Instruct | 6.7B | 65.4% |
|
|
| DeepSeek-Coder-Instruct | 33B | **70.0%** |
|
|
|
|
|