Cristian Cezar Moisés
6e1d0ed9c6
Update model.py
...
Introduced constants for magic values.
Created a function to initialize distributed settings.
Added assertions and comments for clarity.
Ensured proper docstrings and types for clarity.
Improved formatting and structure to enhance readability.
2025-01-27 23:21:33 -03:00
Cristian Cezar Moisés
18323417c1
Update kernel.py
...
Improved docstrings for better understanding of the functions.
Added specific error messages for input validation.
Kept the structure of the code while making it easier to read and follow.
Ensured that all exceptions provide meaningful messages to the user.
2025-01-27 23:19:26 -03:00
Cristian Cezar Moisés
ebbbf84d35
Update generate.py
...
Distributed Training Enhancements:
Proper NCCL/Gloo backend selection
Distributed timeout handling
Rank-aware input broadcasting
Graceful process group cleanup
Error Handling & Validation
Comprehensive path validation
Config schema validation
Tokenization error handling
Batch processing safeguards
CUDA OOM fallback handling
Generation Improvements:
Top-k sampling support
Repetition penalty
Dynamic sequence length management
Progress tracking with tqdm
Sequence truncation warnings
Performance Optimizations:
Device-aware tensor placement
Batch tokenization
Memory-efficient generation loop
Model parallelism support
User Experience:
Interactive mode enhancements:
Command history
Input validation
Graceful exit handling
Batch processing:
Progress tracking
Error resilience
Clean output formatting
Code Quality:
Type hints throughout
Configurable constants
Modular architecture
Docstrings with examples
Logging integration
Safety Features:
Tokenizer trust_remote_code handling
Config validation
Input sanitization
Resource cleanup guarantees
2025-01-27 23:16:21 -03:00
Cristian Cezar Moisés
eee820cc36
Update fp8_cast_bf16.py
...
Type Hints & Path Management:
Added comprehensive type annotations
Used pathlib.Path for safer path handling
Enhanced Error Handling:
Structured exception handling throughout
Clear error messages with context
Safe resource cleanup
Memory Management:
LRU cache implementation with OrderedDict
Configurable cache size
Explicit GPU memory cleanup
Logging System:
Configurable logging levels
Detailed progress tracking
Structured error reporting
Code Organization:
Split into focused, testable functions
Clear separation of concerns
Documented public methods
Validation & Safety:
Input path validation
Weight type checking
Clone tensors to prevent reference issues
Performance:
Optimized file loading with LRU cache
Batched tensor processing
Asynchronous CUDA operations
Metadata & Traceability:
Added conversion metadata to output files
Preserved original index structure
Enhanced output index information
Configuration:
Centralized constants
Device-aware execution (CUDA/CPU)
Progress Tracking:
Nested progress bars
Detailed file processing status
2025-01-27 23:13:11 -03:00
Cristian Cezar Moisés
a26fca4a41
Update convert.py
2025-01-27 23:10:08 -03:00
Huang Panpan
b5d872ead0
Merge pull request #341 from enochkan/main
...
docs: Add system requirements for DeepSeek-Infer demo
2025-01-26 09:29:50 +08:00
enoch kan
53d8dc9966
docs: Update system requirements with GitHub Markdown callout
2025-01-25 22:29:54 +00:00
enoch kan
722e6885ef
docs: Improve system requirements section formatting
2025-01-25 22:26:48 +00:00
enoch kan
53b055bc1e
docs: Add system requirements for DeepSeek-Infer demo
2025-01-25 22:21:51 +00:00
Xingkai Yu
ee4c4ea32b
Merge pull request #234 from wangfuchun-fc/patch-1
...
fix: fix readme doc typo.
2025-01-07 17:53:28 +08:00
Huang Panpan
25109d2ccd
Merge pull request #230 from jacksonpradolima/main
...
Add CITATION.cff to provide citation metadata
2025-01-07 14:05:15 +08:00
Huang Panpan
fdbd5be754
Merge pull request #193 from enochkan/main
...
Add docstrings to functions in inference modules for better clarity
2025-01-07 14:02:11 +08:00
wangfuchun-fc
3779a89770
fix: fix readme doc typo.
2025-01-06 22:00:32 +08:00
Jackson Antonio do Prado Lima
c070549279
Add CITATION.cff to provide citation metadata
...
This file includes detailed citation information for the DeepSeek-V3 project, such as authors, DOI, license, and key project details. It enables users to properly cite the work and promotes better academic and professional attribution.
2025-01-05 21:46:37 -03:00
enoch kan
bc77f22afc
Updated model.py docstrings
2025-01-05 18:24:31 +00:00
enoch kan
a1296f099e
Enhance documentation and update .gitignore for model conversion scripts
2025-01-05 18:18:18 +00:00
GeeeekExplorer
fd011c11aa
torch rmsnorm
2025-01-05 14:33:48 +08:00
Xingkai Yu
9b288b86cc
Update README.md
2025-01-03 15:30:48 +08:00
Huang Panpan
0d16ea24c8
Merge pull request #206 from kutt/patch-1
...
use alert formatting for notes in readme
2025-01-03 09:48:03 +08:00
kutt
21bc231f32
use alert formatting for notes in readme
2025-01-02 15:02:52 +01:00
Xingkai Yu
8710ec2ecb
require model-parallel in convert.py
2024-12-31 18:05:55 +08:00
Huang Panpan
7c2466b310
Update issue templates
2024-12-31 14:49:05 +08:00
Huang Panpan
1b8e18cc29
Merge pull request #21 from eltociear/patch-1
...
docs: update README.md
2024-12-30 15:03:30 +08:00
Haswell Iris
94410f8d58
Merge pull request #33 from zhyncs/main
...
docs: update SGLang usage
2024-12-30 14:37:38 +08:00
zhyncs
68d0061937
upd
2024-12-30 14:25:28 +08:00
zhyncs
2fc98d1cdf
upd
2024-12-30 14:21:00 +08:00
zhyncs
a1edf4138e
upd
2024-12-30 14:18:00 +08:00
zhyncs
8638950ec2
docs: update SGLang usage
2024-12-30 14:13:27 +08:00
DeepSeekDDM
83dd18eda4
Update README.md
...
add citation format to the arxiv-version paper
2024-12-30 11:04:14 +08:00
Ikko Eltociear Ashimine
710c8b8b6e
docs: update README.md
...
HuggingFace -> Hugging Face
2024-12-29 00:43:11 +09:00
Yang Wang
8f1c9488b5
handle missing scale_inv_name ( #2 )
...
* handle missing scale_inv_name
Fixed an issue where `weight` and `weight_scale_inv` (e.g. `model.layers.39.mlp.experts.92.gate_proj.weight` and `model.layers.39.mlp.experts.92.gate_proj.weight_scale_inv`) were not in the same SafeTensor, causing an assertion error due to scale_inv_name not being in the state_dict.
* sort filename to reduce memory costs
* Add CUDA cache clearing in memory management
Added torch.cuda.empty_cache() to free up unused memory on the GPU,
2024-12-27 09:34:38 +08:00
Huang Panpan
c8087bd8b8
Merge pull request #9 from simon-mo/vllm
...
Docs: add vLLM as supported engine
2024-12-27 09:16:09 +08:00
simon-mo
e2c15caf04
add version
...
Signed-off-by: simon-mo <simon.mo@hey.com>
2024-12-26 17:11:31 -08:00
simon-mo
cf47874d8e
Docs: add vLLM as supported engine
...
Signed-off-by: simon-mo <simon.mo@hey.com>
2024-12-26 17:10:33 -08:00
stack-heap-overflow
4c2fdb8f55
Release DeepSeek-V3
2024-12-26 19:01:57 +08:00
stack-heap-overflow
4b58dc6bfc
Initial commit
2024-12-26 17:52:41 +08:00