DeepSeek-V3

mirror of https://github.com/deepseek-ai/DeepSeek-V3.git synced 2025-07-18 15:19:09 -04:00

History

Cristian Cezar Moisés ebbbf84d35 Update generate.py Distributed Training Enhancements: Proper NCCL/Gloo backend selection Distributed timeout handling Rank-aware input broadcasting Graceful process group cleanup Error Handling & Validation Comprehensive path validation Config schema validation Tokenization error handling Batch processing safeguards CUDA OOM fallback handling Generation Improvements: Top-k sampling support Repetition penalty Dynamic sequence length management Progress tracking with tqdm Sequence truncation warnings Performance Optimizations: Device-aware tensor placement Batch tokenization Memory-efficient generation loop Model parallelism support User Experience: Interactive mode enhancements: Command history Input validation Graceful exit handling Batch processing: Progress tracking Error resilience Clean output formatting Code Quality: Type hints throughout Configurable constants Modular architecture Docstrings with examples Logging integration Safety Features: Tokenizer trust_remote_code handling Config validation Input sanitization Resource cleanup guarantees		2025-01-27 23:16:21 -03:00
..
configs	Release DeepSeek-V3	2024-12-26 19:01:57 +08:00
convert.py	Update convert.py	2025-01-27 23:10:08 -03:00
fp8_cast_bf16.py	Update fp8_cast_bf16.py	2025-01-27 23:13:11 -03:00
generate.py	Update generate.py	2025-01-27 23:16:21 -03:00
kernel.py	Enhance documentation and update .gitignore for model conversion scripts	2025-01-05 18:18:18 +00:00
model.py	Updated model.py docstrings	2025-01-05 18:24:31 +00:00
requirements.txt	Release DeepSeek-V3	2024-12-26 19:01:57 +08:00