From 6ed131d4a27f68b5adb8748a344d641553acc3a6 Mon Sep 17 00:00:00 2001
From: zwd973-deepseek <zengwangding@deepseek.com>
Date: Thu, 11 Jan 2024 10:50:34 +0800
Subject: [PATCH] initial commit

---
 README.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/README.md b/README.md
index 1055068..863b278 100644
--- a/README.md
+++ b/README.md
@@ -60,6 +60,7 @@ DeepSeekMoE 16B is a Mixture-of-Experts (MoE) language model with 16.4B paramete
 It employs an innovative MoE architecture, which involves two principal strategies: fine-grained expert segmentation and shared experts isolation. 
 It is trained from scratch on 2T tokens, and exhibits comparable performance with DeekSeek 7B and LLaMA2 7B, with only about 40% of computations. 
 For research purposes, we release the model checkpoints of DeepSeekMoE 16B Base and DeepSeekMoE 16B Chat to the public, which can be deployed on a single GPU with 40GB of memory without the need for quantization.
+The model code file can be found [here](https://huggingface.co/deepseek-ai/deepseek-moe-16b-base/blob/main/modeling_deepseek.py).
 
 ## 2. Evaluation Results