mirror of
https://github.com/deepseek-ai/DeepSeek-VL.git
synced 2025-04-19 10:09:09 -04:00
Update README.md
This commit is contained in:
parent
fd8e4c0c6a
commit
a642ff8a61
64
README.md
64
README.md
@ -209,70 +209,6 @@ print(f"{prepare_inputs['sft_format'][0]}", answer)
|
|||||||
python cli_chat.py --model_path deepseek-ai/deepseek-vl-7b-chat
|
python cli_chat.py --model_path deepseek-ai/deepseek-vl-7b-chat
|
||||||
```
|
```
|
||||||
|
|
||||||
Avoiding the use of the provided function `apply_chat_template`, you can also interact with our model following the sample template. Note that `messages` should be replaced by your input.
|
|
||||||
|
|
||||||
```
|
|
||||||
User: {messages[0]['content']}
|
|
||||||
|
|
||||||
Assistant: {messages[1]['content']}<|end▁of▁sentence|>User: {messages[2]['content']}
|
|
||||||
|
|
||||||
Assistant:
|
|
||||||
```
|
|
||||||
|
|
||||||
**Note:** By default (`add_special_tokens=True`), our tokenizer automatically adds a `bos_token` (`<|begin▁of▁sentence|>`) before the input text. Additionally, since the system prompt is not compatible with this version of our models, we DO NOT RECOMMEND including the system prompt in your input.
|
|
||||||
|
|
||||||
### Inference with vLLM
|
|
||||||
|
|
||||||
You can also employ [vLLM](https://github.com/vllm-project/vllm) for high-throughput inference.
|
|
||||||
|
|
||||||
**Text Completion**
|
|
||||||
|
|
||||||
```python
|
|
||||||
from vllm import LLM, SamplingParams
|
|
||||||
|
|
||||||
tp_size = 4 # Tensor Parallelism
|
|
||||||
sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=100)
|
|
||||||
model_name = "deepseek-ai/deepseek-llm-67b-base"
|
|
||||||
llm = LLM(model=model_name, trust_remote_code=True, gpu_memory_utilization=0.9, tensor_parallel_size=tp_size)
|
|
||||||
|
|
||||||
prompts = [
|
|
||||||
"If everyone in a country loves one another,",
|
|
||||||
"The research should also focus on the technologies",
|
|
||||||
"To determine if the label is correct, we need to"
|
|
||||||
]
|
|
||||||
outputs = llm.generate(prompts, sampling_params)
|
|
||||||
|
|
||||||
generated_text = [output.outputs[0].text for output in outputs]
|
|
||||||
print(generated_text)
|
|
||||||
```
|
|
||||||
|
|
||||||
**Chat Completion**
|
|
||||||
|
|
||||||
```python
|
|
||||||
from transformers import AutoTokenizer
|
|
||||||
from vllm import LLM, SamplingParams
|
|
||||||
|
|
||||||
tp_size = 4 # Tensor Parallelism
|
|
||||||
sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=100)
|
|
||||||
model_name = "deepseek-ai/deepseek-llm-67b-chat"
|
|
||||||
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|
||||||
llm = LLM(model=model_name, trust_remote_code=True, gpu_memory_utilization=0.9, tensor_parallel_size=tp_size)
|
|
||||||
|
|
||||||
messages_list = [
|
|
||||||
[{"role": "user", "content": "Who are you?"}],
|
|
||||||
[{"role": "user", "content": "What can you do?"}],
|
|
||||||
[{"role": "user", "content": "Explain Transformer briefly."}],
|
|
||||||
]
|
|
||||||
# Avoid adding bos_token repeatedly
|
|
||||||
prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list]
|
|
||||||
|
|
||||||
sampling_params.stop = [tokenizer.eos_token]
|
|
||||||
outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)
|
|
||||||
|
|
||||||
generated_text = [output.outputs[0].text for output in outputs]
|
|
||||||
print(generated_text)
|
|
||||||
```
|
|
||||||
|
|
||||||
## 6. FAQ
|
## 6. FAQ
|
||||||
|
|
||||||
### Could You Provide the tokenizer.model File for Model Quantization?
|
### Could You Provide the tokenizer.model File for Model Quantization?
|
||||||
|
Loading…
Reference in New Issue
Block a user