mirror of
https://github.com/deepseek-ai/DeepSeek-MoE.git
synced 2025-02-23 06:09:05 -05:00
Fix typo in README.md
It just happens sometimes
This commit is contained in:
parent
66edeee5a4
commit
5585d2b60b
@ -62,7 +62,7 @@
|
|||||||
|
|
||||||
DeepSeekMoE 16B is a Mixture-of-Experts (MoE) language model with 16.4B parameters.
|
DeepSeekMoE 16B is a Mixture-of-Experts (MoE) language model with 16.4B parameters.
|
||||||
It employs an innovative MoE architecture, which involves two principal strategies: fine-grained expert segmentation and shared experts isolation.
|
It employs an innovative MoE architecture, which involves two principal strategies: fine-grained expert segmentation and shared experts isolation.
|
||||||
It is trained from scratch on 2T English and Chinese tokens, and exhibits comparable performance with DeekSeek 7B and LLaMA2 7B, with only about 40% of computations.
|
It is trained from scratch on 2T English and Chinese tokens, and exhibits comparable performance with DeepSeek 7B and LLaMA2 7B, with only about 40% of computations.
|
||||||
For research purposes, we release the model checkpoints of DeepSeekMoE 16B Base and DeepSeekMoE 16B Chat to the public, which can be deployed on a single GPU with 40GB of memory without the need for quantization.
|
For research purposes, we release the model checkpoints of DeepSeekMoE 16B Base and DeepSeekMoE 16B Chat to the public, which can be deployed on a single GPU with 40GB of memory without the need for quantization.
|
||||||
The model code file can be found [here](https://huggingface.co/deepseek-ai/deepseek-moe-16b-base/blob/main/modeling_deepseek.py).
|
The model code file can be found [here](https://huggingface.co/deepseek-ai/deepseek-moe-16b-base/blob/main/modeling_deepseek.py).
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user