-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
运行的sh:
CUDA_VISIBLE_DEVICES=0,1,2,3
NPROC_PER_NODE=4
swift rlhf
--rlhf_type grpo
--model '/public/home/202420164005/model/Qwen/Qwen2.5-Omni-7B'
--adapters '/public/home/202420164005/code/ms-swift/SFT/checkpoints/v3-20251203-105243/checkpoint-180'
--external_plugins '/public/home/202420164005/code/ms-swift/GRPO/reward.py'
--reward_funcs external_ad_content_llm_remote
--train_type lora
--lora_rank 8
--lora_alpha 32
--target_modules all-linear
--torch_dtype bfloat16
--dataset '/public/home/202420164005/code/ms-swift/GRPO/data/train_gemini_rewrite.jsonl'
--val_dataset '/public/home/202420164005/code/ms-swift/GRPO/data/val_gemini_rewrite.jsonl'
--max_length 4096
--max_completion_length 800
--num_train_epochs 5
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--learning_rate 1e-5
--gradient_accumulation_steps 1
--eval_steps 20
--save_steps 20
--save_total_limit 2
--logging_steps 5
--output_dir '/public/home/202420164005/code/ms-swift/GRPO/checkpoint_1128'
--warmup_ratio 0.05
--dataloader_num_workers 4
--num_generations 4
--temperature 0.9
--system '你是一名顶尖的语音语言学和认知神经科学专家。'
--log_completions true
终端输出:
run sh: /public/home/202420164005/miniconda3/envs/swift-env/bin/python3.10 -m torch.distributed.run --nproc_per_node 4 /public/home/202420164005/code/ms-swift/swift/cli/rlhf.py --rlhf_type grpo --model /public/home/202420164005/model/Qwen/Qwen2.5-Omni-7B --adapters /public/home/202420164005/code/ms-swift/SFT/checkpoints/v3-20251203-105243/checkpoint-180 --external_plugins /public/home/202420164005/code/ms-swift/GRPO/reward.py --reward_funcs external_ad_content_llm_remote --train_type lora --lora_rank 8 --lora_alpha 32 --target_modules all-linear --torch_dtype bfloat16 --dataset /public/home/202420164005/code/ms-swift/GRPO/data/train_gemini_rewrite.jsonl --val_dataset /public/home/202420164005/code/ms-swift/GRPO/data/val_gemini_rewrite.jsonl --max_length 4096 --max_completion_length 800 --num_train_epochs 5 --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --learning_rate 1e-5 --gradient_accumulation_steps 1 --eval_steps 20 --save_steps 20 --save_total_limit 2 --logging_steps 5 --output_dir /public/home/202420164005/code/ms-swift/GRPO/checkpoint_1128 --warmup_ratio 0.05 --dataloader_num_workers 4 --num_generations 4 --temperature 0.9 --system 你是一名顶尖的语音语言学和认知神经科学专家。 --log_completions true
[INFO:swift] Successfully registered /public/home/202420164005/code/ms-swift/swift/llm/dataset/data/dataset_info.json.
[INFO:swift] Setting args.remove_unused_columns: False
[INFO] Using scorer model: gemini-2.5-flash
[INFO] API endpoint: https://api.apiyi.com/v1/chat/completions
[INFO] Using scorer model: gemini-2.5-flash
[INFO] API endpoint: https://api.apiyi.com/v1/chat/completions
[INFO] Using scorer model: gemini-2.5-flash
[INFO] API endpoint: https://api.apiyi.com/v1/chat/completions
[INFO] Using scorer model: gemini-2.5-flash
[INFO] API endpoint: https://api.apiyi.com/v1/chat/completions
[INFO:swift] Successfully imported external_plugins: ['/public/home/202420164005/code/ms-swift/GRPO/reward.py'].
[INFO:swift] rank: 0, local_rank: 0, world_size: 4, local_world_size: 4
Unrecognized keys in rope_scaling for 'rope_type'='default': {'mrope_section'}
Unrecognized keys in rope_scaling for 'rope_type'='default': {'mrope_section'}
Unrecognized keys in rope_scaling for 'rope_type'='default': {'mrope_section'}
Unrecognized keys in rope_scaling for 'rope_type'='default': {'mrope_section'}
torch_dtype is deprecated! Use dtype instead!
torch_dtype is deprecated! Use dtype instead!
torch_dtype is deprecated! Use dtype instead!
torch_dtype is deprecated! Use dtype instead!
[INFO:swift] Because len(args.val_dataset) > 0, setting split_dataset_ratio: 0.0
[INFO:swift] Setting args.lazy_tokenize: True
Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
Additional context
Add any other context about the problem here(在这里补充其他信息)