We provide a test script to evaluate the performance of the **deepseek-coder** model on code generation benchmarks, [**MBPP**](https://huggingface.co/datasets/mbpp), with 3-shot setting.
We've created a sample script, **eval.sh**, that demonstrates how to test the **deepseek-coder-1.3b-base** model on the MBPP dataset leveraging **8** GPUs.
We report experimental results here for several models. We set the maximum input length to **4096** and the maximum output length to **500**, and employ the **greedy search strategy**.