Update README.md

2025-07-18 15:09:06 -04:00 · 2024-03-08 16:17:27 +08:00 · 2024-03-08 16:17:27 +08:00 · a642ff8a61
commit a642ff8a61
parent fd8e4c0c6a
1 changed files with 0 additions and 64 deletions
--- a/README.md
+++ b/README.md
@ -209,70 +209,6 @@ print(f"{prepare_inputs['sft_format'][0]}", answer)
 python cli_chat.py --model_path deepseek-ai/deepseek-vl-7b-chat
 ```
 Avoiding the use of the provided function `apply_chat_template`, you can also interact with our model following the sample template. Note that `messages` should be replaced by your input.
 ```
 User: {messages[0]['content']}
 Assistant: {messages[1]['content']}<｜end▁of▁sentence｜>User: {messages[2]['content']}
 Assistant:
 ```
 **Note:** By default (`add_special_tokens=True`), our tokenizer automatically adds a `bos_token` (`<｜begin▁of▁sentence｜>`) before the input text. Additionally, since the system prompt is not compatible with this version of our models, we DO NOT RECOMMEND including the system prompt in your input.
 ### Inference with vLLM
 You can also employ [vLLM](https://github.com/vllm-project/vllm) for high-throughput inference.
 **Text Completion**
 ```python
 from vllm import LLM, SamplingParams
 tp_size = 4 # Tensor Parallelism
 sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=100)
 model_name = "deepseek-ai/deepseek-llm-67b-base"
 llm = LLM(model=model_name, trust_remote_code=True, gpu_memory_utilization=0.9, tensor_parallel_size=tp_size)
 prompts = [
    "If everyone in a country loves one another,",
    "The research should also focus on the technologies",
    "To determine if the label is correct, we need to"
 ]
 outputs = llm.generate(prompts, sampling_params)
 generated_text = [output.outputs[0].text for output in outputs]
 print(generated_text)
 ```
 **Chat Completion**
 ```python
 from transformers import AutoTokenizer
 from vllm import LLM, SamplingParams
 tp_size = 4 # Tensor Parallelism
 sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=100)
 model_name = "deepseek-ai/deepseek-llm-67b-chat"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 llm = LLM(model=model_name, trust_remote_code=True, gpu_memory_utilization=0.9, tensor_parallel_size=tp_size)
 messages_list = [
    [{"role": "user", "content": "Who are you?"}],
    [{"role": "user", "content": "What can you do?"}],
    [{"role": "user", "content": "Explain Transformer briefly."}],
 ]
 # Avoid adding bos_token repeatedly
 prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list]
 sampling_params.stop = [tokenizer.eos_token]
 outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)
 generated_text = [output.outputs[0].text for output in outputs]
 print(generated_text)
 ```
 ## 6. FAQ
 ### Could You Provide the tokenizer.model File for Model Quantization?