Commit Graph

38 Commits

Author SHA1 Message Date
Cristian Cezar Moisés
e1ed2e8465
Update fp8_cast_bf16.py
Increased Clarity: Added more comments and detailed docstrings to improve clarity and maintainability.
Efficient Dictionary Comprehension: Used dictionary comprehension to filter out None values in new_state_dict.
Safe Dictionary Modification: Used pop with a default value to safely remove keys from the dictionary without raising exceptions.
Consistent Type Hinting: Enhanced type hints for better clarity and consistency.
2025-01-27 23:24:46 -03:00
Cristian Cezar Moisés
6e51b03eb1
Update convert.py
Refactored File Copying: The token file copying logic is now encapsulated in its own function, copy_token_files.
Improved Logging: Added more context to the logs to enhance debugging capabilities.
Type Hints: Ensured that all functions have clear type hints.
Error Handling: Improved error messages to provide more insight.
Code Readability: Improved overall readability by breaking down complex functions into simpler helper functions.
2025-01-27 23:23:28 -03:00
Cristian Cezar Moisés
6e1d0ed9c6
Update model.py
Introduced constants for magic values.
Created a function to initialize distributed settings.
Added assertions and comments for clarity.
Ensured proper docstrings and types for clarity.
Improved formatting and structure to enhance readability.
2025-01-27 23:21:33 -03:00
Cristian Cezar Moisés
18323417c1
Update kernel.py
Improved docstrings for better understanding of the functions.
Added specific error messages for input validation.
Kept the structure of the code while making it easier to read and follow.
Ensured that all exceptions provide meaningful messages to the user.
2025-01-27 23:19:26 -03:00
Cristian Cezar Moisés
ebbbf84d35
Update generate.py
Distributed Training Enhancements:
        Proper NCCL/Gloo backend selection
        Distributed timeout handling
        Rank-aware input broadcasting
        Graceful process group cleanup

    Error Handling & Validation
        Comprehensive path validation
        Config schema validation
        Tokenization error handling
        Batch processing safeguards
        CUDA OOM fallback handling

    Generation Improvements:
        Top-k sampling support
        Repetition penalty
        Dynamic sequence length management
        Progress tracking with tqdm
        Sequence truncation warnings

    Performance Optimizations:
        Device-aware tensor placement
        Batch tokenization
        Memory-efficient generation loop
        Model parallelism support

    User Experience:

        Interactive mode enhancements:
            Command history
            Input validation
            Graceful exit handling

        Batch processing:
            Progress tracking
            Error resilience
            Clean output formatting

    Code Quality:
        Type hints throughout
        Configurable constants
        Modular architecture
        Docstrings with examples
        Logging integration

    Safety Features:
        Tokenizer trust_remote_code handling
        Config validation
        Input sanitization
        Resource cleanup guarantees
2025-01-27 23:16:21 -03:00
Cristian Cezar Moisés
eee820cc36
Update fp8_cast_bf16.py
Type Hints & Path Management:
        Added comprehensive type annotations
        Used pathlib.Path for safer path handling

    Enhanced Error Handling:
        Structured exception handling throughout
        Clear error messages with context
        Safe resource cleanup

    Memory Management:
        LRU cache implementation with OrderedDict
        Configurable cache size
        Explicit GPU memory cleanup

    Logging System:
        Configurable logging levels
        Detailed progress tracking
        Structured error reporting

    Code Organization:
        Split into focused, testable functions
        Clear separation of concerns
        Documented public methods

    Validation & Safety:
        Input path validation
        Weight type checking
        Clone tensors to prevent reference issues

    Performance:
        Optimized file loading with LRU cache
        Batched tensor processing
        Asynchronous CUDA operations

    Metadata & Traceability:
        Added conversion metadata to output files
        Preserved original index structure
        Enhanced output index information

    Configuration:
        Centralized constants
        Device-aware execution (CUDA/CPU)

    Progress Tracking:
        Nested progress bars
        Detailed file processing status
2025-01-27 23:13:11 -03:00
Cristian Cezar Moisés
a26fca4a41
Update convert.py 2025-01-27 23:10:08 -03:00
Huang Panpan
b5d872ead0
Merge pull request #341 from enochkan/main
docs: Add system requirements for DeepSeek-Infer demo
2025-01-26 09:29:50 +08:00
enoch kan
53d8dc9966 docs: Update system requirements with GitHub Markdown callout 2025-01-25 22:29:54 +00:00
enoch kan
722e6885ef docs: Improve system requirements section formatting 2025-01-25 22:26:48 +00:00
enoch kan
53b055bc1e docs: Add system requirements for DeepSeek-Infer demo 2025-01-25 22:21:51 +00:00
Xingkai Yu
ee4c4ea32b
Merge pull request #234 from wangfuchun-fc/patch-1
fix: fix readme doc typo.
2025-01-07 17:53:28 +08:00
Huang Panpan
25109d2ccd
Merge pull request #230 from jacksonpradolima/main
Add CITATION.cff to provide citation metadata
2025-01-07 14:05:15 +08:00
Huang Panpan
fdbd5be754
Merge pull request #193 from enochkan/main
Add docstrings to functions in inference modules for better clarity
2025-01-07 14:02:11 +08:00
wangfuchun-fc
3779a89770
fix: fix readme doc typo. 2025-01-06 22:00:32 +08:00
Jackson Antonio do Prado Lima
c070549279 Add CITATION.cff to provide citation metadata
This file includes detailed citation information for the DeepSeek-V3 project, such as authors, DOI, license, and key project details. It enables users to properly cite the work and promotes better academic and professional attribution.
2025-01-05 21:46:37 -03:00
enoch kan
bc77f22afc Updated model.py docstrings 2025-01-05 18:24:31 +00:00
enoch kan
a1296f099e Enhance documentation and update .gitignore for model conversion scripts 2025-01-05 18:18:18 +00:00
GeeeekExplorer
fd011c11aa torch rmsnorm 2025-01-05 14:33:48 +08:00
Xingkai Yu
9b288b86cc
Update README.md 2025-01-03 15:30:48 +08:00
Huang Panpan
0d16ea24c8
Merge pull request #206 from kutt/patch-1
use alert formatting for notes in readme
2025-01-03 09:48:03 +08:00
kutt
21bc231f32
use alert formatting for notes in readme 2025-01-02 15:02:52 +01:00
Xingkai Yu
8710ec2ecb
require model-parallel in convert.py 2024-12-31 18:05:55 +08:00
Huang Panpan
7c2466b310
Update issue templates 2024-12-31 14:49:05 +08:00
Huang Panpan
1b8e18cc29
Merge pull request #21 from eltociear/patch-1
docs: update README.md
2024-12-30 15:03:30 +08:00
Haswell Iris
94410f8d58
Merge pull request #33 from zhyncs/main
docs: update SGLang usage
2024-12-30 14:37:38 +08:00
zhyncs
68d0061937 upd 2024-12-30 14:25:28 +08:00
zhyncs
2fc98d1cdf upd 2024-12-30 14:21:00 +08:00
zhyncs
a1edf4138e upd 2024-12-30 14:18:00 +08:00
zhyncs
8638950ec2 docs: update SGLang usage 2024-12-30 14:13:27 +08:00
DeepSeekDDM
83dd18eda4
Update README.md
add citation format to the arxiv-version paper
2024-12-30 11:04:14 +08:00
Ikko Eltociear Ashimine
710c8b8b6e
docs: update README.md
HuggingFace -> Hugging Face
2024-12-29 00:43:11 +09:00
Yang Wang
8f1c9488b5
handle missing scale_inv_name (#2)
* handle missing scale_inv_name

Fixed an issue where `weight` and `weight_scale_inv` (e.g. `model.layers.39.mlp.experts.92.gate_proj.weight` and `model.layers.39.mlp.experts.92.gate_proj.weight_scale_inv`) were not in the same SafeTensor, causing an assertion error due to scale_inv_name not being in the state_dict.

* sort filename to reduce memory costs

* Add CUDA cache clearing in memory management

Added torch.cuda.empty_cache() to free up unused memory on the GPU,
2024-12-27 09:34:38 +08:00
Huang Panpan
c8087bd8b8
Merge pull request #9 from simon-mo/vllm
Docs: add vLLM as supported engine
2024-12-27 09:16:09 +08:00
simon-mo
e2c15caf04 add version
Signed-off-by: simon-mo <simon.mo@hey.com>
2024-12-26 17:11:31 -08:00
simon-mo
cf47874d8e Docs: add vLLM as supported engine
Signed-off-by: simon-mo <simon.mo@hey.com>
2024-12-26 17:10:33 -08:00
stack-heap-overflow
4c2fdb8f55 Release DeepSeek-V3 2024-12-26 19:01:57 +08:00
stack-heap-overflow
4b58dc6bfc
Initial commit 2024-12-26 17:52:41 +08:00