Refactored File Copying: The token file copying logic is now encapsulated in its own function, copy_token_files.
Improved Logging: Added more context to the logs to enhance debugging capabilities.
Type Hints: Ensured that all functions have clear type hints.
Error Handling: Improved error messages to provide more insight.
Code Readability: Improved overall readability by breaking down complex functions into simpler helper functions.
Introduced constants for magic values.
Created a function to initialize distributed settings.
Added assertions and comments for clarity.
Ensured proper docstrings and types for clarity.
Improved formatting and structure to enhance readability.
Improved docstrings for better understanding of the functions.
Added specific error messages for input validation.
Kept the structure of the code while making it easier to read and follow.
Ensured that all exceptions provide meaningful messages to the user.
* handle missing scale_inv_name
Fixed an issue where `weight` and `weight_scale_inv` (e.g. `model.layers.39.mlp.experts.92.gate_proj.weight` and `model.layers.39.mlp.experts.92.gate_proj.weight_scale_inv`) were not in the same SafeTensor, causing an assertion error due to scale_inv_name not being in the state_dict.
* sort filename to reduce memory costs
* Add CUDA cache clearing in memory management
Added torch.cuda.empty_cache() to free up unused memory on the GPU,