mirror of
https://github.com/deepseek-ai/DeepSeek-V3.git
synced 2025-02-23 14:18:57 -05:00
vThis update introduces parallel processing for token generation using torch.multiprocessing.Pool. The new implementation improves inference speed by processing multiple sequences concurrently. - Added the generate_parallel() function for parallel token generation. - Used multiprocessing to distribute the workload across multiple processes, allowing for faster generation of tokens for multiple prompts. - The generate_single_sequence() function was added to handle individual sequence generation logic, which is called by each worker in parallel. - The num_workers parameter is introduced to control the number of worker processes (default is 4). - Model is shared across processes for efficient memory usage. These changes are particularly beneficial for batch processing or multi-prompt generation scenarios where multiple sequences need to be generated simultaneously. |
||
---|---|---|
.. | ||
configs | ||
convert.py | ||
fp8_cast_bf16.py | ||
generate.py | ||
kernel.py | ||
model.py | ||
requirements.txt |