diff --git a/README.md b/README.md index 228ee67..ec27336 100644 --- a/README.md +++ b/README.md @@ -297,9 +297,13 @@ print(generated_text) ## 6. FAQ -### Could You Provide the tokenizer.model File for GGUF Model Quantization? +### Could You Provide the tokenizer.model File for Model Quantization? -DeepSeek LLM utilizes the [HuggingFace Tokenizer](https://huggingface.co/docs/tokenizers/index) to implement the Bytelevel-BPE algorithm, with specially designed pre-tokenizers to ensure optimal performance. Currently, there is no direct way to convert the tokenizer into a SentencePiece tokenizer. We have submitted a [PR](https://github.com/ggerganov/llama.cpp/pull/4070) to the popular quantization repository [llama.cpp](https://github.com/ggerganov/llama.cpp) to fully support all HuggingFace pre-tokenizers, including ours. +DeepSeek Coder utilizes the [HuggingFace Tokenizer](https://huggingface.co/docs/tokenizers/index) to implement the Bytelevel-BPE algorithm, with specially designed pre-tokenizers to ensure optimal performance. Currently, there is no direct way to convert the tokenizer into a SentencePiece tokenizer. We are contributing to the open-source quantization methods facilitate the usage of HuggingFace Tokenizer. + +#### GGUF(llama.cpp) + +We have submitted a [PR](https://github.com/ggerganov/llama.cpp/pull/4070) to the popular quantization repository [llama.cpp](https://github.com/ggerganov/llama.cpp) to fully support all HuggingFace pre-tokenizers, including ours. While waiting for the PR to be merged, you can generate your GGUF model using the following steps: @@ -311,11 +315,17 @@ git checkout regex_gpt2_preprocess make python3 -m pip install -r requirements.txt # generate GGUF model -python convert-hf-to-gguf.py --outfile --model-name deepseekcoder +python convert-hf-to-gguf.py --outfile --model-name deepseekllm # use q4_0 quantization as an example ./quantize q4_0 ./main -m -n 128 -p ``` +#### GPTQ(exllamav2) + +`UPDATE:`[exllamav2](https://github.com/turboderp/exllamav2) has been able to support Huggingface Tokenizer. Please pull the latest version and try out. + +Remeber to set RoPE scaling to 4 for correct output, more discussion could be found in this [PR](https://github.com/turboderp/exllamav2/pull/189). + ## 7. Limitation