DeepSeek-V3

mirror of https://github.com/deepseek-ai/DeepSeek-V3.git synced 2025-07-18 23:29:07 -04:00

Author	SHA1	Message	Date
Cristian Cezar Moisés	6e1d0ed9c6	Update model.py Introduced constants for magic values. Created a function to initialize distributed settings. Added assertions and comments for clarity. Ensured proper docstrings and types for clarity. Improved formatting and structure to enhance readability.	2025-01-27 23:21:33 -03:00
Cristian Cezar Moisés	18323417c1	Update kernel.py Improved docstrings for better understanding of the functions. Added specific error messages for input validation. Kept the structure of the code while making it easier to read and follow. Ensured that all exceptions provide meaningful messages to the user.	2025-01-27 23:19:26 -03:00
Cristian Cezar Moisés	ebbbf84d35	Update generate.py Distributed Training Enhancements: Proper NCCL/Gloo backend selection Distributed timeout handling Rank-aware input broadcasting Graceful process group cleanup Error Handling & Validation Comprehensive path validation Config schema validation Tokenization error handling Batch processing safeguards CUDA OOM fallback handling Generation Improvements: Top-k sampling support Repetition penalty Dynamic sequence length management Progress tracking with tqdm Sequence truncation warnings Performance Optimizations: Device-aware tensor placement Batch tokenization Memory-efficient generation loop Model parallelism support User Experience: Interactive mode enhancements: Command history Input validation Graceful exit handling Batch processing: Progress tracking Error resilience Clean output formatting Code Quality: Type hints throughout Configurable constants Modular architecture Docstrings with examples Logging integration Safety Features: Tokenizer trust_remote_code handling Config validation Input sanitization Resource cleanup guarantees	2025-01-27 23:16:21 -03:00
Cristian Cezar Moisés	eee820cc36	Update fp8_cast_bf16.py Type Hints & Path Management: Added comprehensive type annotations Used pathlib.Path for safer path handling Enhanced Error Handling: Structured exception handling throughout Clear error messages with context Safe resource cleanup Memory Management: LRU cache implementation with OrderedDict Configurable cache size Explicit GPU memory cleanup Logging System: Configurable logging levels Detailed progress tracking Structured error reporting Code Organization: Split into focused, testable functions Clear separation of concerns Documented public methods Validation & Safety: Input path validation Weight type checking Clone tensors to prevent reference issues Performance: Optimized file loading with LRU cache Batched tensor processing Asynchronous CUDA operations Metadata & Traceability: Added conversion metadata to output files Preserved original index structure Enhanced output index information Configuration: Centralized constants Device-aware execution (CUDA/CPU) Progress Tracking: Nested progress bars Detailed file processing status	2025-01-27 23:13:11 -03:00
Cristian Cezar Moisés	a26fca4a41	Update convert.py	2025-01-27 23:10:08 -03:00
enoch kan	bc77f22afc	Updated model.py docstrings	2025-01-05 18:24:31 +00:00
enoch kan	a1296f099e	Enhance documentation and update .gitignore for model conversion scripts	2025-01-05 18:18:18 +00:00
GeeeekExplorer	fd011c11aa	torch rmsnorm	2025-01-05 14:33:48 +08:00
Xingkai Yu	8710ec2ecb	require model-parallel in convert.py	2024-12-31 18:05:55 +08:00
Yang Wang	8f1c9488b5	handle missing scale_inv_name (#2 ) * handle missing scale_inv_name Fixed an issue where `weight` and `weight_scale_inv` (e.g. `model.layers.39.mlp.experts.92.gate_proj.weight` and `model.layers.39.mlp.experts.92.gate_proj.weight_scale_inv`) were not in the same SafeTensor, causing an assertion error due to scale_inv_name not being in the state_dict. * sort filename to reduce memory costs * Add CUDA cache clearing in memory management Added torch.cuda.empty_cache() to free up unused memory on the GPU,	2024-12-27 09:34:38 +08:00
stack-heap-overflow	4c2fdb8f55	Release DeepSeek-V3	2024-12-26 19:01:57 +08:00

11 Commits