From b39e9db1380f71b0544a25125f45fe6843589238 Mon Sep 17 00:00:00 2001 From: Bingxuan Wang Date: Thu, 30 Nov 2023 10:58:40 +0800 Subject: [PATCH] Update README.md fix typos --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index ec27336..d778cb4 100644 --- a/README.md +++ b/README.md @@ -126,7 +126,7 @@ In line with Grok-1, we have evaluated the model's mathematical capabilities usi result -**Remark:** Some results are obtained by DeepSeek authors, while others are done by Grok-1 authors. We found some models count the score of the last question (Llemma 34b and Mammoth) while some (MetaMath-7B) are not in the original evaluation. In our evaluation, we count the last question score. Evaluation details are [here](https://github.com/deepseek-ai/DeepSeek-LLM/tree/dev/evaluation/hungarian_national_hs_solutions). +**Remark:** Some results are obtained by DeepSeek authors, while others are done by Grok-1 authors. We found some models count the score of the last question (Llemma 34b and Mammoth) while some (MetaMath-7B) are not in the original evaluation. In our evaluation, we count the last question score. Evaluation details are [here](https://github.com/deepseek-ai/DeepSeek-LLM/tree/HEAD/evaluation/hungarian_national_hs_solutions). --- @@ -159,7 +159,7 @@ The specific questions and test cases will be released soon. Stay tuned! | DeepSeek LLM 7B Chat | 57.9 | 49.4 | 62.6 | 48.2 | 42.3 | 47.0 | 49.7 | 75.0 | | DeepSeek LLM 67B Chat | 81.5 | 71.1 | 84.1 | 73.8 | 71.7 | 65.2 | 67.8 | 85.1 | -**Note:** We evaluate chat models with 0-shot for MMLU, GSM8K, C-Eval, and CMMLU. More evaluation results can be found [here](https://github.com/deepseek-ai/DeepSeek-LLM/blob/dev/evaluation/more_results.md). +**Note:** We evaluate chat models with 0-shot for MMLU, GSM8K, C-Eval, and CMMLU. More evaluation results can be found [here](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/evaluation/more_results.md). **Revisit Multi-Choice Question Benchmarks**