DeepSeek-V3

mirror of https://github.com/deepseek-ai/DeepSeek-V3.git synced 2025-07-04 07:21:36 -04:00

Author	SHA1	Message	Date
Cristian Cezar Moisés	e1ed2e8465	Update fp8_cast_bf16.py Increased Clarity: Added more comments and detailed docstrings to improve clarity and maintainability. Efficient Dictionary Comprehension: Used dictionary comprehension to filter out None values in new_state_dict. Safe Dictionary Modification: Used pop with a default value to safely remove keys from the dictionary without raising exceptions. Consistent Type Hinting: Enhanced type hints for better clarity and consistency.	2025-01-27 23:24:46 -03:00
Cristian Cezar Moisés	6e51b03eb1	Update convert.py Refactored File Copying: The token file copying logic is now encapsulated in its own function, copy_token_files. Improved Logging: Added more context to the logs to enhance debugging capabilities. Type Hints: Ensured that all functions have clear type hints. Error Handling: Improved error messages to provide more insight. Code Readability: Improved overall readability by breaking down complex functions into simpler helper functions.	2025-01-27 23:23:28 -03:00
Cristian Cezar Moisés	6e1d0ed9c6	Update model.py Introduced constants for magic values. Created a function to initialize distributed settings. Added assertions and comments for clarity. Ensured proper docstrings and types for clarity. Improved formatting and structure to enhance readability.	2025-01-27 23:21:33 -03:00
Cristian Cezar Moisés	18323417c1	Update kernel.py Improved docstrings for better understanding of the functions. Added specific error messages for input validation. Kept the structure of the code while making it easier to read and follow. Ensured that all exceptions provide meaningful messages to the user.	2025-01-27 23:19:26 -03:00
Cristian Cezar Moisés	ebbbf84d35	Update generate.py Distributed Training Enhancements: Proper NCCL/Gloo backend selection Distributed timeout handling Rank-aware input broadcasting Graceful process group cleanup Error Handling & Validation Comprehensive path validation Config schema validation Tokenization error handling Batch processing safeguards CUDA OOM fallback handling Generation Improvements: Top-k sampling support Repetition penalty Dynamic sequence length management Progress tracking with tqdm Sequence truncation warnings Performance Optimizations: Device-aware tensor placement Batch tokenization Memory-efficient generation loop Model parallelism support User Experience: Interactive mode enhancements: Command history Input validation Graceful exit handling Batch processing: Progress tracking Error resilience Clean output formatting Code Quality: Type hints throughout Configurable constants Modular architecture Docstrings with examples Logging integration Safety Features: Tokenizer trust_remote_code handling Config validation Input sanitization Resource cleanup guarantees	2025-01-27 23:16:21 -03:00
Cristian Cezar Moisés	eee820cc36	Update fp8_cast_bf16.py Type Hints & Path Management: Added comprehensive type annotations Used pathlib.Path for safer path handling Enhanced Error Handling: Structured exception handling throughout Clear error messages with context Safe resource cleanup Memory Management: LRU cache implementation with OrderedDict Configurable cache size Explicit GPU memory cleanup Logging System: Configurable logging levels Detailed progress tracking Structured error reporting Code Organization: Split into focused, testable functions Clear separation of concerns Documented public methods Validation & Safety: Input path validation Weight type checking Clone tensors to prevent reference issues Performance: Optimized file loading with LRU cache Batched tensor processing Asynchronous CUDA operations Metadata & Traceability: Added conversion metadata to output files Preserved original index structure Enhanced output index information Configuration: Centralized constants Device-aware execution (CUDA/CPU) Progress Tracking: Nested progress bars Detailed file processing status	2025-01-27 23:13:11 -03:00
Cristian Cezar Moisés	a26fca4a41	Update convert.py	2025-01-27 23:10:08 -03:00
Huang Panpan	b5d872ead0	Merge pull request #341 from enochkan/main docs: Add system requirements for DeepSeek-Infer demo	2025-01-26 09:29:50 +08:00
enoch kan	53d8dc9966	docs: Update system requirements with GitHub Markdown callout	2025-01-25 22:29:54 +00:00
enoch kan	722e6885ef	docs: Improve system requirements section formatting	2025-01-25 22:26:48 +00:00
enoch kan	53b055bc1e	docs: Add system requirements for DeepSeek-Infer demo	2025-01-25 22:21:51 +00:00
Xingkai Yu	ee4c4ea32b	Merge pull request #234 from wangfuchun-fc/patch-1 fix: fix readme doc typo.	2025-01-07 17:53:28 +08:00
Huang Panpan	25109d2ccd	Merge pull request #230 from jacksonpradolima/main Add CITATION.cff to provide citation metadata	2025-01-07 14:05:15 +08:00
Huang Panpan	fdbd5be754	Merge pull request #193 from enochkan/main Add docstrings to functions in inference modules for better clarity	2025-01-07 14:02:11 +08:00
wangfuchun-fc	3779a89770	fix: fix readme doc typo.	2025-01-06 22:00:32 +08:00
Jackson Antonio do Prado Lima	c070549279	Add CITATION.cff to provide citation metadata This file includes detailed citation information for the DeepSeek-V3 project, such as authors, DOI, license, and key project details. It enables users to properly cite the work and promotes better academic and professional attribution.	2025-01-05 21:46:37 -03:00
enoch kan	bc77f22afc	Updated model.py docstrings	2025-01-05 18:24:31 +00:00
enoch kan	a1296f099e	Enhance documentation and update .gitignore for model conversion scripts	2025-01-05 18:18:18 +00:00
GeeeekExplorer	fd011c11aa	torch rmsnorm	2025-01-05 14:33:48 +08:00
Xingkai Yu	9b288b86cc	Update README.md	2025-01-03 15:30:48 +08:00
Huang Panpan	0d16ea24c8	Merge pull request #206 from kutt/patch-1 use alert formatting for notes in readme	2025-01-03 09:48:03 +08:00
kutt	21bc231f32	use alert formatting for notes in readme	2025-01-02 15:02:52 +01:00
Xingkai Yu	8710ec2ecb	require model-parallel in convert.py	2024-12-31 18:05:55 +08:00
Huang Panpan	7c2466b310	Update issue templates	2024-12-31 14:49:05 +08:00
Huang Panpan	1b8e18cc29	Merge pull request #21 from eltociear/patch-1 docs: update README.md	2024-12-30 15:03:30 +08:00
Haswell Iris	94410f8d58	Merge pull request #33 from zhyncs/main docs: update SGLang usage	2024-12-30 14:37:38 +08:00
zhyncs	68d0061937	upd	2024-12-30 14:25:28 +08:00
zhyncs	2fc98d1cdf	upd	2024-12-30 14:21:00 +08:00
zhyncs	a1edf4138e	upd	2024-12-30 14:18:00 +08:00
zhyncs	8638950ec2	docs: update SGLang usage	2024-12-30 14:13:27 +08:00
DeepSeekDDM	83dd18eda4	Update README.md add citation format to the arxiv-version paper	2024-12-30 11:04:14 +08:00
Ikko Eltociear Ashimine	710c8b8b6e	docs: update README.md HuggingFace -> Hugging Face	2024-12-29 00:43:11 +09:00
Yang Wang	8f1c9488b5	handle missing scale_inv_name (#2 ) * handle missing scale_inv_name Fixed an issue where `weight` and `weight_scale_inv` (e.g. `model.layers.39.mlp.experts.92.gate_proj.weight` and `model.layers.39.mlp.experts.92.gate_proj.weight_scale_inv`) were not in the same SafeTensor, causing an assertion error due to scale_inv_name not being in the state_dict. * sort filename to reduce memory costs * Add CUDA cache clearing in memory management Added torch.cuda.empty_cache() to free up unused memory on the GPU,	2024-12-27 09:34:38 +08:00
Huang Panpan	c8087bd8b8	Merge pull request #9 from simon-mo/vllm Docs: add vLLM as supported engine	2024-12-27 09:16:09 +08:00
simon-mo	e2c15caf04	add version Signed-off-by: simon-mo <simon.mo@hey.com>	2024-12-26 17:11:31 -08:00
simon-mo	cf47874d8e	Docs: add vLLM as supported engine Signed-off-by: simon-mo <simon.mo@hey.com>	2024-12-26 17:10:33 -08:00
stack-heap-overflow	4c2fdb8f55	Release DeepSeek-V3	2024-12-26 19:01:57 +08:00
stack-heap-overflow	4b58dc6bfc	Initial commit	2024-12-26 17:52:41 +08:00

38 Commits