From 30b7c65fb600cd0c3b842b456cc713f31c5a428e Mon Sep 17 00:00:00 2001
From: Afueth Thomas <97304915+Afueth@users.noreply.github.com>
Date: Mon, 27 Jan 2025 15:16:42 +0530
Subject: [PATCH] Update README.md

Updated the capitalization of the word "recommended" to "Recommended" in a heading to ensure consistency with title case formatting throughout the document.
This change aligns the heading style with the rest of the README for a more polished and professional appearance.
---
 README.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index 7ecf87e..9b6496a 100644
--- a/README.md
+++ b/README.md
@@ -304,7 +304,7 @@ Or batch inference on a given file:
 torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR generate.py --ckpt-path /path/to/DeepSeek-V3-Demo --config configs/config_671B.json --input-file $FILE
 ```
 
-### 6.2 Inference with SGLang (recommended)
+### 6.2 Inference with SGLang (Recommended)
 
 [SGLang](https://github.com/sgl-project/sglang) currently supports [MLA optimizations](https://lmsys.org/blog/2024-09-04-sglang-v0-3/#deepseek-multi-head-latent-attention-mla-throughput-optimizations), [DP Attention](https://lmsys.org/blog/2024-12-04-sglang-v0-4/#data-parallelism-attention-for-deepseek-models), FP8 (W8A8), FP8 KV Cache, and Torch Compile, delivering state-of-the-art latency and throughput performance among open-source frameworks.
 
@@ -316,18 +316,18 @@ Multi-Token Prediction (MTP) is in development, and progress can be tracked in t
 
 Here are the launch instructions from the SGLang team: https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3
 
-### 6.3 Inference with LMDeploy (recommended)
+### 6.3 Inference with LMDeploy (Recommended)
 [LMDeploy](https://github.com/InternLM/lmdeploy), a flexible and high-performance inference and serving framework tailored for large language models, now supports DeepSeek-V3. It offers both offline pipeline processing and online deployment capabilities, seamlessly integrating with PyTorch-based workflows.
 
 For comprehensive step-by-step instructions on running DeepSeek-V3 with LMDeploy, please refer to here: https://github.com/InternLM/lmdeploy/issues/2960
 
 
-### 6.4 Inference with TRT-LLM (recommended)
+### 6.4 Inference with TRT-LLM (Recommended)
 
 [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) now supports the DeepSeek-V3 model, offering precision options such as BF16 and INT4/INT8 weight-only. Support for FP8 is currently in progress and will be released soon. You can access the custom branch of TRTLLM specifically for DeepSeek-V3 support through the following link to experience the new features directly: https://github.com/NVIDIA/TensorRT-LLM/tree/deepseek/examples/deepseek_v3. 
 
 
-### 6.5 Inference with vLLM (recommended)
+### 6.5 Inference with vLLM (Recommended)
 
 [vLLM](https://github.com/vllm-project/vllm) v0.6.6 supports DeepSeek-V3 inference for FP8 and BF16 modes on both NVIDIA and AMD GPUs. Aside from standard techniques, vLLM offers _pipeline parallelism_ allowing you to run this model on multiple machines connected by networks. For detailed guidance, please refer to the [vLLM instructions](https://docs.vllm.ai/en/latest/serving/distributed_serving.html). Please feel free to follow [the enhancement plan](https://github.com/vllm-project/vllm/issues/11539) as well.