Error Handling: Improved error handling for file operations and JSON loading.
Logging: Clearer logging messages for better debugging and monitoring.
Code Comments: Added more descriptive comments to enhance code readability.
Dynamic Sequence Management: Clarified the logic for managing prompt lengths and token generation.
Performance: Minor optimizations in code structure and logic flow for better performance.
Code Structure: Organized functions and constants for better readability and maintainability.
Increased Clarity: Added more comments and detailed docstrings to improve clarity and maintainability.
Efficient Dictionary Comprehension: Used dictionary comprehension to filter out None values in new_state_dict.
Safe Dictionary Modification: Used pop with a default value to safely remove keys from the dictionary without raising exceptions.
Consistent Type Hinting: Enhanced type hints for better clarity and consistency.
Refactored File Copying: The token file copying logic is now encapsulated in its own function, copy_token_files.
Improved Logging: Added more context to the logs to enhance debugging capabilities.
Type Hints: Ensured that all functions have clear type hints.
Error Handling: Improved error messages to provide more insight.
Code Readability: Improved overall readability by breaking down complex functions into simpler helper functions.
Introduced constants for magic values.
Created a function to initialize distributed settings.
Added assertions and comments for clarity.
Ensured proper docstrings and types for clarity.
Improved formatting and structure to enhance readability.
Improved docstrings for better understanding of the functions.
Added specific error messages for input validation.
Kept the structure of the code while making it easier to read and follow.
Ensured that all exceptions provide meaningful messages to the user.
* handle missing scale_inv_name
Fixed an issue where `weight` and `weight_scale_inv` (e.g. `model.layers.39.mlp.experts.92.gate_proj.weight` and `model.layers.39.mlp.experts.92.gate_proj.weight_scale_inv`) were not in the same SafeTensor, causing an assertion error due to scale_inv_name not being in the state_dict.
* sort filename to reduce memory costs
* Add CUDA cache clearing in memory management
Added torch.cuda.empty_cache() to free up unused memory on the GPU,