From 70aaeb30ff75f545b70d3649f15f517b4213b907 Mon Sep 17 00:00:00 2001 From: DeepSeekPH <152240452+DeepSeekPH@users.noreply.github.com> Date: Tue, 9 Jan 2024 12:57:32 +0800 Subject: [PATCH] Update README.md (#29) --- README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/README.md b/README.md index ce7465e..7d60125 100644 --- a/README.md +++ b/README.md @@ -130,13 +130,12 @@ In line with Grok-1, we have evaluated the model's mathematical capabilities usi --- -**Instruction Following Evaluation:** On Nov 15th, 2023, Google released an [instruction following evaluation dataset](https://arxiv.org/pdf/2311.07911.pdf). They identified 25 types of verifiable instructions and constructed around 500 prompts, with each prompt containing one or more verifiable instructions. We use the prompt-level loose metric to evaluate all models. +**Instruction Following Evaluation:** On Nov 15th, 2023, Google released an [instruction following evaluation dataset](https://arxiv.org/pdf/2311.07911.pdf). They identified 25 types of verifiable instructions and constructed around 500 prompts, with each prompt containing one or more verifiable instructions. We use the prompt-level loose metric to evaluate all models. Here, we used the first version released by Google for the evaluation. For the Google revised test set evaluation results, please refer to the number in our paper.
result
- --- **LeetCode Weekly Contest:**