Add files via upload

This commit is contained in:
zzt0504 2025-02-05 19:34:27 +08:00 committed by GitHub
parent 3ed90a74af
commit 72f26006e6
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -116,27 +116,27 @@ DeepSeek-R1-Distill 模型基于开源模型微调,使用 DeepSeek-R1 生成
| | 架构 | - | - | MoE | - | - | MoE | | | 架构 | - | - | MoE | - | - | MoE |
| | 激活参数量 | - | - | 37B | - | - | 37B | | | 激活参数量 | - | - | 37B | - | - | 37B |
| | 总参数量 | - | - | 671B | - | - | 671B | | | 总参数量 | - | - | 671B | - | - | 671B |
| 英文 | MMLU (Pass@1) | 88.3 | 87.2 | 88.5 | 85.2 | **91.8** | 90.8 | | 英文 | 大规模多任务语言理解 (Pass@1) | 88.3 | 87.2 | 88.5 | 85.2 | **91.8** | 90.8 |
| | MMLU-Redux (EM) | 88.9 | 88.0 | 89.1 | 86.7 | - | **92.9** | | | 大规模多任务语言理解 redux集 (EM) | 88.9 | 88.0 | 89.1 | 86.7 | - | **92.9** |
| | MMLU-Pro (EM) | 78.0 | 72.6 | 75.9 | 80.3 | - | **84.0** | | | 大规模多任务语言理解Pro集 (EM) | 78.0 | 72.6 | 75.9 | 80.3 | - | **84.0** |
| | DROP (3-shot F1) | 88.3 | 83.7 | 91.6 | 83.9 | 90.2 | **92.2** | | | 段落级离散推理 (3-shot F1) | 88.3 | 83.7 | 91.6 | 83.9 | 90.2 | **92.2** |
| | IF-Eval (Prompt Strict) | **86.5** | 84.3 | 86.1 | 84.8 | - | 83.3 | | | IF-Eval聚焦可验证指令评估 (Prompt Strict) | **86.5** | 84.3 | 86.1 | 84.8 | - | 83.3 |
| | GPQA-Diamond (Pass@1) | 65.0 | 49.9 | 59.1 | 60.0 | **75.7** | 71.5 | | | 研究生级的google问答基准 (Pass@1) | 65.0 | 49.9 | 59.1 | 60.0 | **75.7** | 71.5 |
| | SimpleQA (正确率) | 28.4 | 38.2 | 24.9 | 7.0 | **47.0** | 30.1 | | | open AI SimpleQA评估 (正确率) | 28.4 | 38.2 | 24.9 | 7.0 | **47.0** | 30.1 |
| | FRAMES (准确率) | 72.5 | 80.5 | 73.3 | 76.9 | - | **82.5** | | | FRAMES (准确率) | 72.5 | 80.5 | 73.3 | 76.9 | - | **82.5** |
| | AlpacaEval2.0 (LC胜率) | 52.0 | 51.1 | 70.0 | 57.8 | - | **87.6** | | | Tatsu Lab的AlpacaEval2.0指令遵循语言模型的自动评估(LC胜率) | 52.0 | 51.1 | 70.0 | 57.8 | - | **87.6** |
| | ArenaHard (GPT-4-1106) | 85.2 | 80.4 | 85.5 | 92.0 | - | **92.3** | | | ArenaHard基准 (GPT-4-1106) | 85.2 | 80.4 | 85.5 | 92.0 | - | **92.3** |
| 代码 | LiveCodeBench (Pass@1-COT) | 33.8 | 34.2 | - | 53.8 | 63.4 | **65.9** | | 代码 | LiveCodeBench编码基准 (Pass@1-COT) | 33.8 | 34.2 | - | 53.8 | 63.4 | **65.9** |
| | Codeforces (百分位) | 20.3 | 23.6 | 58.7 | 93.4 | **96.6** | 96.3 | | | Codeforces基准 | 20.3% | 23.6% | 58.7% | 93.4 | **96.6%** | 96.3% |
| | Codeforces (分) | 717 | 759 | 1134 | 1820 | **2061** | 2029 | | | Codeforces基准 (分) | 717 | 759 | 1134 | 1820 | **2061** | 2029 |
| | SWE Verified (解决率) | **50.8** | 38.8 | 42.0 | 41.6 | 48.9 | 49.2 | | | SWE Verified (解决率) | **50.8** | 38.8 | 42.0 | 41.6 | 48.9 | 49.2 |
| | Aider-Polyglot (准确率) | 45.3 | 16.0 | 49.6 | 32.9 | **61.7** | 53.3 | | | Aider-Polyglot (准确率) | 45.3 | 16.0 | 49.6 | 32.9 | **61.7** | 53.3 |
| 数学 | AIME 2024 (Pass@1) | 16.0 | 9.3 | 39.2 | 63.6 | 79.2 | **79.8** | | 数学 | 美国数学邀请赛 2024届 (Pass@1) | 16.0 | 9.3 | 39.2 | 63.6 | 79.2 | **79.8** |
| | MATH-500 (Pass@1) | 78.3 | 74.6 | 90.2 | 90.0 | 96.4 | **97.3** | | | MATH-500数学问题集 (Pass@1) | 78.3 | 74.6 | 90.2 | 90.0 | 96.4 | **97.3** |
| | CNMO 2024 (Pass@1) | 13.1 | 10.8 | 43.2 | 67.6 | - | **78.8** | | | 中国数学奥林匹克竞赛 2024届 (Pass@1) | 13.1 | 10.8 | 43.2 | 67.6 | - | **78.8** |
| 中文 | CLUEWSC (EM) | 85.4 | 87.9 | 90.9 | 89.9 | - | **92.8** | | 中文 | CLUEWSC中文语言理解测评基准 (EM) | 85.4 | 87.9 | 90.9 | 89.9 | - | **92.8** |
| | C-Eval (EM) | 76.7 | 76.0 | 86.5 | 68.9 | - | **91.8** | | | 中文大模型评估基准 (EM) | 76.7 | 76.0 | 86.5 | 68.9 | - | **91.8** |
| | C-SimpleQA (正确率) | 55.4 | 58.7 | **68.0** | 40.3 | - | 63.7 | | | C-SimpleQA大型语言模型的中文事实评价集 (正确率) | 55.4 | 58.7 | **68.0** | 40.3 | - | 63.7 |
</div> </div>