sleeepeer/Llama-3.1-8B-Instruct-GRPO-alpaca_mix_combine_naive-llm-judge-42 Text Generation • 8B • Updated Jul 16 • 11
sleeepeer/Llama-3.1-8B-Instruct-GRPO-alpaca_mix_combine_naive_least_similar-llm-judge-42 Text Generation • 8B • Updated Jul 16 • 9
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-alpaca-mix-injected-llm-judge-42-checkpoint-3000 Updated Jul 14
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-alpaca-mix-injected-llm-judge-42-checkpoint-4000 Updated Jul 14
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-mixed-llm-judge-42 Text Generation • 8B • Updated Jul 10 • 11
sleeepeer/Llama-3.1-8B-Instruct-GRPO-AT-short-10-NEW-TRAIN-directly-output-easy-2-AT-7 Text Generation • 8B • Updated Jul 7 • 6
sleeepeer/Llama-3.1-8B-Instruct-GRPO-AT-short-10-NEW-TRAIN-directly-output-easy-2-AT-6 Text Generation • 8B • Updated Jul 7 • 8
sleeepeer/Llama-3.1-8B-Instruct-GRPO-AT-short-10-NEW-TRAIN-directly-output-easy-2-AT-5 Text Generation • 8B • Updated Jul 6 • 6
sleeepeer/Llama-3.1-8B-Instruct-GRPO-AT-short-10-NEW-TRAIN-directly-output-easy-2-AT-4 Text Generation • 8B • Updated Jul 6 • 6
sleeepeer/Llama-3.1-8B-Instruct-GRPO-AT-short-10-NEW-TRAIN-directly-output-easy-2-AT-3 Text Generation • 8B • Updated Jul 6 • 8
sleeepeer/Llama-3.1-8B-Instruct-GRPO-AT-short-10-NEW-TRAIN-directly-output-easy-2-AT-2 Text Generation • 8B • Updated Jul 6 • 9
sleeepeer/Llama-3.1-8B-Instruct-GRPO-AT-short-10-NEW-TRAIN-directly-output-easy-2-AT-1 Text Generation • 8B • Updated Jul 5 • 8