Program-of-Thought (PoT) fine-tuning and evaluation for GSM8K-style perturbed math data.
Install dependencies:
pip install -r requirements.txtIf needed:
pip install tf-keraspython finetune_pot.py \
--model_name meta-llama/Llama-3.1-8B-Instruct \
--data_path training_data/gsm8k_concrete_training_data.json \
--output_dir pot-finetunedpython evaluate_prompts.py \
--model_path pot-finetuned \
--dataset_path gsm_perturbed_with_new_questions.json \
--prompt pot-
finetune_pot.py: training- parser commands:
--model_name--data_path--prompt_key--completion_key--output_dir--num_epochs--batch_size--learning_rate--lora_r--lora_alpha--lora_dropout--use_4bit--gradient_accumulation_steps--max_length--train_size--eval_size--warmup_steps--logging_steps--save_steps--wandb_project--wandb_run_name--no_wandb
- parser commands:
-
evaluate_prompts.py: evaluation- parser commands:
--model_path--dataset_path--prompt--prompt_templates--output_file--device--tolerance--use_base_model--num_attempts--temperature--top_p--max_new_tokens
- parser commands:
-
prompt_templates.json: prompt templates -
training_data/: example datasets
This repo does not include the ReasonAgain dataset. You can find it in the CogComp reasoning-eval repository.