DeepSeek-V3/inference
Cristian Cezar Moisés eee820cc36
Update fp8_cast_bf16.py
Type Hints & Path Management:
        Added comprehensive type annotations
        Used pathlib.Path for safer path handling

    Enhanced Error Handling:
        Structured exception handling throughout
        Clear error messages with context
        Safe resource cleanup

    Memory Management:
        LRU cache implementation with OrderedDict
        Configurable cache size
        Explicit GPU memory cleanup

    Logging System:
        Configurable logging levels
        Detailed progress tracking
        Structured error reporting

    Code Organization:
        Split into focused, testable functions
        Clear separation of concerns
        Documented public methods

    Validation & Safety:
        Input path validation
        Weight type checking
        Clone tensors to prevent reference issues

    Performance:
        Optimized file loading with LRU cache
        Batched tensor processing
        Asynchronous CUDA operations

    Metadata & Traceability:
        Added conversion metadata to output files
        Preserved original index structure
        Enhanced output index information

    Configuration:
        Centralized constants
        Device-aware execution (CUDA/CPU)

    Progress Tracking:
        Nested progress bars
        Detailed file processing status
2025-01-27 23:13:11 -03:00
..
configs Release DeepSeek-V3 2024-12-26 19:01:57 +08:00
convert.py Update convert.py 2025-01-27 23:10:08 -03:00
fp8_cast_bf16.py Update fp8_cast_bf16.py 2025-01-27 23:13:11 -03:00
generate.py Enhance documentation and update .gitignore for model conversion scripts 2025-01-05 18:18:18 +00:00
kernel.py Enhance documentation and update .gitignore for model conversion scripts 2025-01-05 18:18:18 +00:00
model.py Updated model.py docstrings 2025-01-05 18:24:31 +00:00
requirements.txt Release DeepSeek-V3 2024-12-26 19:01:57 +08:00