diff --git a/README.md b/README.md index e5fe885..60042ba 100644 --- a/README.md +++ b/README.md @@ -55,23 +55,26 @@ python eval_multigpu.py \ This script calculates the scores for each expert based on the evaluation datasets. **Usage:** ```bash +export PYTHONPATH=$PYTHONPATH:$(pwd) python scripts/expert/get_expert_scores.py \ - --eval_dataset=translation \ + --eval_dataset=intent \ --base_model_path=deepseek-ai/ESFT-vanilla-lite \ - --output_dir=results/expert_scores/translation \ + --output_dir=results/expert_scores/intent \ --n_sample_tokens=131072 \ --world_size=4 \ --gpus_per_rank=2 + # for N gpus, world_size should be N / gpus_per_rank ``` 3. **generate_expert_config.py** This script generates the configuration to convert a MoE model with only task-relevant tasks trained based on evaluation scores. **Usage:** ```bash +export PYTHONPATH=$PYTHONPATH:$(pwd) python scripts/expert/generate_expert_config.py \ - --eval_datasets=intent,summary,law,translation \ - --expert_scores_dir=results/expert_scores \ - --output_dir=results/expert_configs \ + --eval_dataset=intent \ + --expert_scores_dir=results/expert_scores/intent \ + --output_path=results/expert_configs/intent.json \ --score_function=token \ --top_p=0.2 # the scoring function and top_p are hyperparameters ```