Update README.md

This commit is contained in:
Bingxuan Wang 2023-11-30 10:19:24 +08:00 committed by GitHub
parent 7f842fdc0d
commit 8c240e2d27
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -297,9 +297,13 @@ print(generated_text)
## 6. FAQ ## 6. FAQ
### Could You Provide the tokenizer.model File for GGUF Model Quantization? ### Could You Provide the tokenizer.model File for Model Quantization?
DeepSeek LLM utilizes the [HuggingFace Tokenizer](https://huggingface.co/docs/tokenizers/index) to implement the Bytelevel-BPE algorithm, with specially designed pre-tokenizers to ensure optimal performance. Currently, there is no direct way to convert the tokenizer into a SentencePiece tokenizer. We have submitted a [PR](https://github.com/ggerganov/llama.cpp/pull/4070) to the popular quantization repository [llama.cpp](https://github.com/ggerganov/llama.cpp) to fully support all HuggingFace pre-tokenizers, including ours. DeepSeek Coder utilizes the [HuggingFace Tokenizer](https://huggingface.co/docs/tokenizers/index) to implement the Bytelevel-BPE algorithm, with specially designed pre-tokenizers to ensure optimal performance. Currently, there is no direct way to convert the tokenizer into a SentencePiece tokenizer. We are contributing to the open-source quantization methods facilitate the usage of HuggingFace Tokenizer.
#### GGUF(llama.cpp)
We have submitted a [PR](https://github.com/ggerganov/llama.cpp/pull/4070) to the popular quantization repository [llama.cpp](https://github.com/ggerganov/llama.cpp) to fully support all HuggingFace pre-tokenizers, including ours.
While waiting for the PR to be merged, you can generate your GGUF model using the following steps: While waiting for the PR to be merged, you can generate your GGUF model using the following steps:
@ -311,11 +315,17 @@ git checkout regex_gpt2_preprocess
make make
python3 -m pip install -r requirements.txt python3 -m pip install -r requirements.txt
# generate GGUF model # generate GGUF model
python convert-hf-to-gguf.py <MODEL_PATH> --outfile <GGUF_PATH> --model-name deepseekcoder python convert-hf-to-gguf.py <MODEL_PATH> --outfile <GGUF_PATH> --model-name deepseekllm
# use q4_0 quantization as an example # use q4_0 quantization as an example
./quantize <GGUF_PATH> <OUTPUT_PATH> q4_0 ./quantize <GGUF_PATH> <OUTPUT_PATH> q4_0
./main -m <OUTPUT_PATH> -n 128 -p <PROMPT> ./main -m <OUTPUT_PATH> -n 128 -p <PROMPT>
``` ```
#### GPTQ(exllamav2)
`UPDATE:`[exllamav2](https://github.com/turboderp/exllamav2) has been able to support Huggingface Tokenizer. Please pull the latest version and try out.
Remeber to set RoPE scaling to 4 for correct output, more discussion could be found in this [PR](https://github.com/turboderp/exllamav2/pull/189).
## 7. Limitation ## 7. Limitation