Cristian Cezar Moisés
6e1d0ed9c6
Update model.py
...
Introduced constants for magic values.
Created a function to initialize distributed settings.
Added assertions and comments for clarity.
Ensured proper docstrings and types for clarity.
Improved formatting and structure to enhance readability.
2025-01-27 23:21:33 -03:00
Cristian Cezar Moisés
18323417c1
Update kernel.py
...
Improved docstrings for better understanding of the functions.
Added specific error messages for input validation.
Kept the structure of the code while making it easier to read and follow.
Ensured that all exceptions provide meaningful messages to the user.
2025-01-27 23:19:26 -03:00
Cristian Cezar Moisés
ebbbf84d35
Update generate.py
...
Distributed Training Enhancements:
Proper NCCL/Gloo backend selection
Distributed timeout handling
Rank-aware input broadcasting
Graceful process group cleanup
Error Handling & Validation
Comprehensive path validation
Config schema validation
Tokenization error handling
Batch processing safeguards
CUDA OOM fallback handling
Generation Improvements:
Top-k sampling support
Repetition penalty
Dynamic sequence length management
Progress tracking with tqdm
Sequence truncation warnings
Performance Optimizations:
Device-aware tensor placement
Batch tokenization
Memory-efficient generation loop
Model parallelism support
User Experience:
Interactive mode enhancements:
Command history
Input validation
Graceful exit handling
Batch processing:
Progress tracking
Error resilience
Clean output formatting
Code Quality:
Type hints throughout
Configurable constants
Modular architecture
Docstrings with examples
Logging integration
Safety Features:
Tokenizer trust_remote_code handling
Config validation
Input sanitization
Resource cleanup guarantees
2025-01-27 23:16:21 -03:00
Cristian Cezar Moisés
eee820cc36
Update fp8_cast_bf16.py
...
Type Hints & Path Management:
Added comprehensive type annotations
Used pathlib.Path for safer path handling
Enhanced Error Handling:
Structured exception handling throughout
Clear error messages with context
Safe resource cleanup
Memory Management:
LRU cache implementation with OrderedDict
Configurable cache size
Explicit GPU memory cleanup
Logging System:
Configurable logging levels
Detailed progress tracking
Structured error reporting
Code Organization:
Split into focused, testable functions
Clear separation of concerns
Documented public methods
Validation & Safety:
Input path validation
Weight type checking
Clone tensors to prevent reference issues
Performance:
Optimized file loading with LRU cache
Batched tensor processing
Asynchronous CUDA operations
Metadata & Traceability:
Added conversion metadata to output files
Preserved original index structure
Enhanced output index information
Configuration:
Centralized constants
Device-aware execution (CUDA/CPU)
Progress Tracking:
Nested progress bars
Detailed file processing status
2025-01-27 23:13:11 -03:00
Cristian Cezar Moisés
a26fca4a41
Update convert.py
2025-01-27 23:10:08 -03:00
enoch kan
bc77f22afc
Updated model.py docstrings
2025-01-05 18:24:31 +00:00
enoch kan
a1296f099e
Enhance documentation and update .gitignore for model conversion scripts
2025-01-05 18:18:18 +00:00
GeeeekExplorer
fd011c11aa
torch rmsnorm
2025-01-05 14:33:48 +08:00
Xingkai Yu
8710ec2ecb
require model-parallel in convert.py
2024-12-31 18:05:55 +08:00
Yang Wang
8f1c9488b5
handle missing scale_inv_name ( #2 )
...
* handle missing scale_inv_name
Fixed an issue where `weight` and `weight_scale_inv` (e.g. `model.layers.39.mlp.experts.92.gate_proj.weight` and `model.layers.39.mlp.experts.92.gate_proj.weight_scale_inv`) were not in the same SafeTensor, causing an assertion error due to scale_inv_name not being in the state_dict.
* sort filename to reduce memory costs
* Add CUDA cache clearing in memory management
Added torch.cuda.empty_cache() to free up unused memory on the GPU,
2024-12-27 09:34:38 +08:00
stack-heap-overflow
4c2fdb8f55
Release DeepSeek-V3
2024-12-26 19:01:57 +08:00