diff --git a/Evaluation/HumanEval/README.md b/Evaluation/HumanEval/README.md
index 1c8888a..cd91793 100644
--- a/Evaluation/HumanEval/README.md
+++ b/Evaluation/HumanEval/README.md
@@ -1,6 +1,6 @@
 ## 1. Introduction
 
-We provide a test script to evaluate the performance of the **deepseek-coder** model on code generation benchmarks. We select the widely-used benchmarks: **[HumanEval](https://huggingface.co/datasets/openai_humaneval), [MultiPL-E](https://huggingface.co/datasets/nuprl/MultiPL-E)**.
+We provide a test script to evaluate the performance of the **deepseek-coder** model on code generation benchmarks. We select the widely-used benchmarks: **[HumanEval-Python](https://huggingface.co/datasets/openai_humaneval), [HumanEval-Multilingual](https://huggingface.co/datasets/nuprl/MultiPL-E)**.
 
 
 
@@ -14,7 +14,6 @@ pip install pytorch
 ```
 
 
-
 ## 3. Evaluation
 
 We've created a sample script, **eval.sh**, that demonstrates how to test the **deepseek-coder-1b-python** model on the HumanEval dataset leveraging **8** GPUs. If your use case involves a different model or dataset, simply adjust the script to fit your needs.
@@ -35,7 +34,6 @@ python -m accelerate.commands.launch --config_file test_config.yaml eval_pal.py
 We report experimental results here for 8 main-stream programming languages, **python**, **c++**, **java**, **PHP**, **TypeScript**, **C#**, **Bash**, and **JavaScript**. For all open-source models, we utilize this repository to obtain the performance of the models on the HumanEval dataset. We set the maximum input length to **4096** and the maximum output length to **500**, and employ the **greedy search strategy**.
 
 
-
 #### (1) Multilingual Base Models
 
 | Model             | Size | Python | C++   | Java | PHP  | TS   | C#   | Bash | JS   | Avg  |
@@ -55,13 +53,12 @@ We report experimental results here for 8 main-stream programming languages, **p
 #### (3) Instruction-Tuned Models
 | Model               | Size | Python | C++   | Java | PHP  | TS   | C#   | Bash | JS   | Avg  |
 |---------------------|------|--------|-------|------|------|------|------|------|------|------|
-| ChatGPT             | -    | 70.7%  | 50.3% | 54.5%| 52.2%| 62.3%| 64.6%| 34.8%| 60.9%| 52.2%|
-| GPT-4               | -    | 82.3%  | 70.2% | 74.8%| 70.8%| 73.0%| 77.9%| 51.3%| 83.2%| 72.9%|
+| GPT35-turbo         | -    | 76.2%  | 63.4% | 69.2%| 60.9%| 69.1%| 70.8%| 42.4%| 67.1%| 64.9%|
+| GPT-4               | -    | 84.1%  | 76.4% | 81.6%| 77.2%| 77.4%| 79.1%| 58.2%| 78.0%| 76.5%|
 | WizardCoder         | 16B  | 51.8%  | 41.6% | 41.1%| 42.2%| 44.7%| 46.8%| 12.7%| 42.8%| 40.5%|
 | Phind-CodeLlama     | 34B  | -      | -     | -    | -    | -    | -    | -    | -    | -    |
 | | | | |  |  |  |  |  |  | |
 | OraCoder-Chat (1B)  | 1B  | -      | -     | -    | -    | -    | -    | -    | -    | -    |
-| OraCoder-Chat (7B)  | 7B  | -      | -     | -    | -    | -    | -    | -    | -    | -    |
-| OraCoder-Chat (33B) | 33B | -      | -     | -    | -    | -    | -    | -    | -    | -    |
-
+| OraCoder-Chat (7B)  | 7B  | 78.9%  | 63.4% | 68.4% | 68.9%| 67.2%| 72.8%| 36.7%| 72.7%| 66.1%|
+| OraCoder-Chat (33B) | 33B | 79.3%  | 68.9% | 73.4% | 72.7%| 67.9%| 74.1%| 43.0%| 73.9%| 69.2%|
 
diff --git a/README.md b/README.md
index f2ccc9b..15645de 100644
--- a/README.md
+++ b/README.md
@@ -9,7 +9,9 @@
 
 Deepseek Coder comprises a series of code language models trained on both 87% code and 13% natural language in English and Chinese, with each model pre-trained on 2T tokens. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on project-level code corpus by employing a window size of 16K and a extra fill-in-the-blank task, to support  project-level code completion and infilling. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks. 
 
-<img src="pictures/result.png" alt="result" width="85%">
+<p align="center">
+<img src="pictures/result.png" alt="result" width="80%">
+</p>
 
 - **Massive Training Data**: Trained on 2T tokens, including 87% code and 13% linguistic data in both English and Chinese languages.
   
@@ -29,7 +31,8 @@ Deepseek Coder comprises a series of code language models trained on both 87% co
 - Step 2: Parsing the dependencies of files within the same repository to rearrange the file positions based on their dependencies.
 - Step 3: Concatenating dependent files to form a single example and employ repo-level minhash for deduplication.
 - Step 4: Further filtering out low-quality code, such as codes with syntax errors or poor readability.
-- <img src="pictures/data_clean.png" alt="data_creation" width="100%">
+
+<img src="pictures/data_clean.png" alt="data_creation" width="100%">
 
 #### Model Training
 
@@ -203,9 +206,18 @@ print(tokenizer.decode(outputs[0]))
 ```
 
 ### 5. Evaluation Results
+We evaluate DeepSeek Coder on various coding-related benchmarks.
+The `passk@1` results on HumanEval (Python and Multilingual), MBPP, DS-1000 are reported as follows:
 
-The reproducible code for the following evaluation results can be found in the [Evaluation](https://github.com/deepseek-ai/deepseek-coder/tree/main/Evaluation) directory.
+<p align="center">
+<img src="pictures/table.png" alt="table" width="85%">
+</p>
 
+The result shows that DeepSeek-Coder-Base-33B significantly outperforms existing open-source code LLMs. Compared with CodeLLama34B, it leads by 7.9%, 9.3%, 10.8% and 5.9% respectively on HumanEval Python, HumanEval Multilingual, MBPP and DS-1000.
+Surprisingly, our DeepSeek-Coder-Base-7B reaches the performance of CodeLlama-34B.
+And the DeepSeek-Coder-Instruct-33B model after instruction tuning outperforms GPT35-turbo on HumanEval and achieves comparable result with GPT35-turbo on MBPP.
+
+More evaluation details and reproducible code for above results can be found in the [Evaluation](https://github.com/deepseek-ai/deepseek-coder/tree/main/Evaluation) directory.
 
 ### 6. Lincense
 This code repository is licensed under the MIT License. The use of DeepSeek Coder model and weights is subject to the Model License. DeepSeek Coder supports commercial use.
diff --git a/pictures/result.png b/pictures/result.png
index 75bf7dd..366c7bc 100644
Binary files a/pictures/result.png and b/pictures/result.png differ
diff --git a/pictures/table.png b/pictures/table.png
new file mode 100644
index 0000000..1c9d6f3
Binary files /dev/null and b/pictures/table.png differ