Training details?

by p-e-r-e-g-r-i-n-e - opened Sep 7

Discussion

p-e-r-e-g-r-i-n-e

Sep 7

Would love to have more details on how you did the POLAR training part!

wave-on-discord

Owner Sep 7

basically just https://github.com/RowitZou/POLAR_RFT/blob/main/examples/ppo/qwen3-8b_hh-rlhf.sh on 8xh100, script edited to use 7 gpus with an lmdeploy polar 7b server running on gpu 8, 2 epochs on a subset of my roleplay data. then trained for another epoch with the rollout temperature set to 1.5 and min_p 0.01 (verl doesn't support setting min_p so i had to add support for it myself, it was like a 2 line code change iirc)

wave-on-discord

Owner Sep 7

(that is after 2 epochs of sft on the full data with axolotl)

p-e-r-e-g-r-i-n-e

Sep 8

Thanks, that's very helpful. I was a bit shocked that a 12B model required 8 H100s to train. I guess POLAR is extremely resource demanding, huh?

Fizzarolli

Sep 8

Online RL in general is extremely demanding, you already need 2 instances of the model with all their associated memory costs

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment