Update README.md

Updated the capitalization of the word "recommended" to "Recommended" in a heading to ensure consistency with title case formatting throughout the document. This change aligns the heading style with the rest of the README for a more polished and professional appearance.
2025-06-22 01:23:47 -04:00 · 2025-01-27 15:16:42 +05:30 · 2025-01-27 15:16:42 +05:30 · 30b7c65fb6
commit 30b7c65fb6
parent b5d872ead0
1 changed files with 4 additions and 4 deletions
--- a/README.md
+++ b/README.md
@ -304,7 +304,7 @@ Or batch inference on a given file:
 torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR generate.py --ckpt-path /path/to/DeepSeek-V3-Demo --config configs/config_671B.json --input-file $FILE
 ```
-### 6.2 Inference with SGLang (recommended)
+### 6.2 Inference with SGLang (Recommended)
 [SGLang](https://github.com/sgl-project/sglang) currently supports [MLA optimizations](https://lmsys.org/blog/2024-09-04-sglang-v0-3/#deepseek-multi-head-latent-attention-mla-throughput-optimizations), [DP Attention](https://lmsys.org/blog/2024-12-04-sglang-v0-4/#data-parallelism-attention-for-deepseek-models), FP8 (W8A8), FP8 KV Cache, and Torch Compile, delivering state-of-the-art latency and throughput performance among open-source frameworks.
@ -316,18 +316,18 @@ Multi-Token Prediction (MTP) is in development, and progress can be tracked in t
 Here are the launch instructions from the SGLang team: https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3
-### 6.3 Inference with LMDeploy (recommended)
+### 6.3 Inference with LMDeploy (Recommended)
 [LMDeploy](https://github.com/InternLM/lmdeploy), a flexible and high-performance inference and serving framework tailored for large language models, now supports DeepSeek-V3. It offers both offline pipeline processing and online deployment capabilities, seamlessly integrating with PyTorch-based workflows.
 For comprehensive step-by-step instructions on running DeepSeek-V3 with LMDeploy, please refer to here: https://github.com/InternLM/lmdeploy/issues/2960
-### 6.4 Inference with TRT-LLM (recommended)
+### 6.4 Inference with TRT-LLM (Recommended)
 [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) now supports the DeepSeek-V3 model, offering precision options such as BF16 and INT4/INT8 weight-only. Support for FP8 is currently in progress and will be released soon. You can access the custom branch of TRTLLM specifically for DeepSeek-V3 support through the following link to experience the new features directly: https://github.com/NVIDIA/TensorRT-LLM/tree/deepseek/examples/deepseek_v3. 
-### 6.5 Inference with vLLM (recommended)
+### 6.5 Inference with vLLM (Recommended)
 [vLLM](https://github.com/vllm-project/vllm) v0.6.6 supports DeepSeek-V3 inference for FP8 and BF16 modes on both NVIDIA and AMD GPUs. Aside from standard techniques, vLLM offers _pipeline parallelism_ allowing you to run this model on multiple machines connected by networks. For detailed guidance, please refer to the [vLLM instructions](https://docs.vllm.ai/en/latest/serving/distributed_serving.html). Please feel free to follow [the enhancement plan](https://github.com/vllm-project/vllm/issues/11539) as well.