mirror of
https://github.com/deepseek-ai/DeepSeek-V3.git
synced 2025-04-19 10:08:59 -04:00
Docs: add LightLLM as supported engine
This commit is contained in:
parent
592fd5daf8
commit
1ab09c8780
13
README.md
13
README.md
@ -233,8 +233,9 @@ DeepSeek-V3 can be deployed locally using the following hardware and open-source
|
|||||||
3. **LMDeploy**: Enables efficient FP8 and BF16 inference for local and cloud deployment.
|
3. **LMDeploy**: Enables efficient FP8 and BF16 inference for local and cloud deployment.
|
||||||
4. **TensorRT-LLM**: Currently supports BF16 inference and INT4/8 quantization, with FP8 support coming soon.
|
4. **TensorRT-LLM**: Currently supports BF16 inference and INT4/8 quantization, with FP8 support coming soon.
|
||||||
5. **vLLM**: Support DeepSeek-V3 model with FP8 and BF16 modes for tensor parallelism and pipeline parallelism.
|
5. **vLLM**: Support DeepSeek-V3 model with FP8 and BF16 modes for tensor parallelism and pipeline parallelism.
|
||||||
6. **AMD GPU**: Enables running the DeepSeek-V3 model on AMD GPUs via SGLang in both BF16 and FP8 modes.
|
6. **LightLLM**: Supports single-node or multi-node deployment with DeepSeek-V3 FP8 and BF16.
|
||||||
7. **Huawei Ascend NPU**: Supports running DeepSeek-V3 on Huawei Ascend devices.
|
7. **AMD GPU**: Enables running the DeepSeek-V3 model on AMD GPUs via SGLang in both BF16 and FP8 modes.
|
||||||
|
8. **Huawei Ascend NPU**: Supports running DeepSeek-V3 on Huawei Ascend devices.
|
||||||
|
|
||||||
Since FP8 training is natively adopted in our framework, we only provide FP8 weights. If you require BF16 weights for experimentation, you can use the provided conversion script to perform the transformation.
|
Since FP8 training is natively adopted in our framework, we only provide FP8 weights. If you require BF16 weights for experimentation, you can use the provided conversion script to perform the transformation.
|
||||||
|
|
||||||
@ -328,11 +329,15 @@ For comprehensive step-by-step instructions on running DeepSeek-V3 with LMDeploy
|
|||||||
|
|
||||||
[vLLM](https://github.com/vllm-project/vllm) v0.6.6 supports DeepSeek-V3 inference for FP8 and BF16 modes on both NVIDIA and AMD GPUs. Aside from standard techniques, vLLM offers _pipeline parallelism_ allowing you to run this model on multiple machines connected by networks. For detailed guidance, please refer to the [vLLM instructions](https://docs.vllm.ai/en/latest/serving/distributed_serving.html). Please feel free to follow [the enhancement plan](https://github.com/vllm-project/vllm/issues/11539) as well.
|
[vLLM](https://github.com/vllm-project/vllm) v0.6.6 supports DeepSeek-V3 inference for FP8 and BF16 modes on both NVIDIA and AMD GPUs. Aside from standard techniques, vLLM offers _pipeline parallelism_ allowing you to run this model on multiple machines connected by networks. For detailed guidance, please refer to the [vLLM instructions](https://docs.vllm.ai/en/latest/serving/distributed_serving.html). Please feel free to follow [the enhancement plan](https://github.com/vllm-project/vllm/issues/11539) as well.
|
||||||
|
|
||||||
### 6.6 Recommended Inference Functionality with AMD GPUs
|
### 6.6 Inference with LightLLM (recommended)
|
||||||
|
|
||||||
|
[LightLLM](https://github.com/ModelTC/lightllm/tree/main) LightLLM v1.0.1 supports single-machine and multi-machine tensor parallelism deployment for DeepSeek-R1 (FP8/BF16), achieving state-of-the-art performance. For more details, please refer to [LightLLM instructions](https://lightllm-en.readthedocs.io/en/latest/getting_started/quickstart.html). Additionally, LightLLM offers PD-disaggregation deployment for DeepSeek-V2, and the implementation of PD-disaggregation for DeepSeek-V3 is in development.
|
||||||
|
|
||||||
|
### 6.7 Recommended Inference Functionality with AMD GPUs
|
||||||
|
|
||||||
In collaboration with the AMD team, we have achieved Day-One support for AMD GPUs using SGLang, with full compatibility for both FP8 and BF16 precision. For detailed guidance, please refer to the [SGLang instructions](#63-inference-with-lmdeploy-recommended).
|
In collaboration with the AMD team, we have achieved Day-One support for AMD GPUs using SGLang, with full compatibility for both FP8 and BF16 precision. For detailed guidance, please refer to the [SGLang instructions](#63-inference-with-lmdeploy-recommended).
|
||||||
|
|
||||||
### 6.7 Recommended Inference Functionality with Huawei Ascend NPUs
|
### 6.8 Recommended Inference Functionality with Huawei Ascend NPUs
|
||||||
The [MindIE](https://www.hiascend.com/en/software/mindie) framework from the Huawei Ascend community has successfully adapted the BF16 version of DeepSeek-V3. For step-by-step guidance on Ascend NPUs, please follow the [instructions here](https://modelers.cn/models/MindIE/deepseekv3).
|
The [MindIE](https://www.hiascend.com/en/software/mindie) framework from the Huawei Ascend community has successfully adapted the BF16 version of DeepSeek-V3. For step-by-step guidance on Ascend NPUs, please follow the [instructions here](https://modelers.cn/models/MindIE/deepseekv3).
|
||||||
|
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user