From 839aec9993944f24f261953246cf44d3cef93a84 Mon Sep 17 00:00:00 2001
From: DeepSeekDDM <155411579+DeepSeekDDM@users.noreply.github.com>
Date: Tue, 9 Jan 2024 15:26:50 +0800
Subject: [PATCH] Update README.md

Add intro and evaluation.
Citation will be updated later.
---
 README.md | 54 ++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 48 insertions(+), 6 deletions(-)
diff --git a/README.md b/README.md
index 061ac6c..1055068 100644
--- a/README.md
+++ b/README.md
@@ -56,18 +56,54 @@
 
 ## 1. Introduction
 
+DeepSeekMoE 16B is a Mixture-of-Experts (MoE) language model with 16.4B parameters. 
+It employs an innovative MoE architecture, which involves two principal strategies: fine-grained expert segmentation and shared experts isolation. 
+It is trained from scratch on 2T tokens, and exhibits comparable performance with DeekSeek 7B and LLaMA2 7B, with only about 40% of computations. 
+For research purposes, we release the model checkpoints of DeepSeekMoE 16B Base and DeepSeekMoE 16B Chat to the public, which can be deployed on a single GPU with 40GB of memory without the need for quantization.
+
 ## 2. Evaluation Results
 
+### DeepSeekMoE 16B Base
+
+We evaluate DeepSeekMoE 16B on various benchmarks and compare it with a series of models, as shown in the following.
+
+- Comparison with open source models on the Open LLM Leaderboard. DeepSeekMoE 16B consistently outperforms models with a similar number of activated parameters by a large margin, and achieves comparable performance with LLaMA2 7B, which has approximately 2.5 times the activated parameters.
+
+<p align="center">
+<img src="images/evaluation_deepseekmoe16b_base_openllm.jpg" alt="table" width="50%">
+</p>
+
+- Comparison with DeepSeek 7B on our internal benchmarks. DeepSeek 7B is a dense model trained on the same corpus as DeepSeekMoE 16B. With only 40.5% of computations, DeepSeekMoE 16B achieves comparable performance with DeepSeek 7B. 
+
+<p align="center">
+<img src="images/evaluation_deepseekmoe16b_base_1.jpg" alt="table" width="50%">
+</p>
+
+- Comparison with LLaMA2 7B on our internal benchmarks. With only 39.6% of computations, DeepSeekMoE 16B outperforms LLaMA2 7B on the majority of benchmarks. 
+
+<p align="center">
+<img src="images/evaluation_deepseekmoe16b_base_2.jpg" alt="table" width="50%">
+</p>
+
+### DeepSeekMoE 16B Chat
+
+We also evaluate DeepSeekMoE 16B Chat on various benchmarks and compare it with DeepSeek 7B Chat and LLaMA2 7B SFT. All of the compared models follow the same fine-tuning setting and data for fair comparison. 
+The evaluation results are shown in the following. With only about 40% of computations, DeepSeekMoE 16B Chat achieves comparable or better performance than DeepSeek 7B Chat and LLaMA2 7B SFT. 
+
+<p align="center">
+<img src="images/evaluation_deepseekmoe16b_chat.jpg" alt="table" width="60%">
+</p>
+
 ## 3. Model Downloads
 
-We release the DeepSeek MoE 16B, including both base and chat models, to the public. To support a broader and more diverse range of research within both academic and commercial communities. Please **note** that the use of this model is subject to the terms outlined in [License section](#5-license). Commercial usage is permitted under these terms.
+We release the DeepSeekMoE 16B, including both base and chat models, to the public. To support a broader and more diverse range of research within both academic and commercial communities. Please **note** that the use of this model is subject to the terms outlined in [License section](#5-license). Commercial usage is permitted under these terms.
 
 ### Huggingface
 
 |         Model         | Sequence Length |                                Download                                 |
 |:---------------------:|:---------------:|:-----------------------------------------------------------------------:|
-| DeepSeek MoE 16B Base  |      4096       | 🤗 [HuggingFace](https://huggingface.co/deepseek-ai/deepseek-moe-16b-base)  |
-| DeepSeek MoE 16B Chat  |      4096       | 🤗 [HuggingFace](https://huggingface.co/deepseek-ai/deepseek-moe-16b-chat)  |
+| DeepSeekMoE 16B Base  |      4096       | 🤗 [HuggingFace](https://huggingface.co/deepseek-ai/deepseek-moe-16b-base)  |
+| DeepSeekMoE 16B Chat  |      4096       | 🤗 [HuggingFace](https://huggingface.co/deepseek-ai/deepseek-moe-16b-chat)  |
 
 ## 4. Quick Start
 ### Installation
@@ -136,7 +172,7 @@ Assistant:
 
 **Note:** By default (`add_special_tokens=True`), our tokenizer automatically adds a `bos_token` (`<｜begin▁of▁sentence｜>`) before the input text. Additionally, since the system prompt is not compatible with this version of our models, we DO NOT RECOMMEND including the system prompt in your input.
 
-### How to Fine-tune DeepSeek-MoE
+### How to Fine-tune DeepSeekMoE
 
 We provide script `fintune/finetune.py` for users to finetune our models on downstream tasks.
 
@@ -149,7 +185,7 @@ pip install -r requirements.txt
 Please follow [Sample Dataset Format](https://huggingface.co/datasets/garage-bAInd/Open-Platypus) to prepare your training data.
 Each item has two required fields `instruction` and `output`.
 
-After data preparation, you can use the sample shell script to finetune deepseek-MoE model. 
+After data preparation, you can use the sample shell script to finetune the DeepSeekMoE model. 
 Remember to specify `DATA_PATH`, `OUTPUT_PATH`.
 And please choose appropriate hyper-parameters(e.g., `learning_rate`, `per_device_train_batch_size`) according to your scenario.
 
@@ -221,12 +257,18 @@ deepspeed finetune.py \
 ```
 
 ## 5. License
-This code repository is licensed under the MIT License. The use of DeepSeek models is subject to the Model License. DeepSeek supports commercial use.
+This code repository is licensed under the MIT License. The use of DeepSeekMoE models is subject to the Model License. DeepSeekMoE supports commercial use.
 
 See the [LICENSE-CODE](LICENSE-CODE) and [LICENSE-MODEL](LICENSE-MODEL) for more details.
 
 ## 6. Citation
 
+```
+@article{deepseekmoe,
+  [coming soon]
+}
+```
+
 
 ## 7. Contact