Cornell-AGI

university

AI & ML interests

Reinforcement Learning from Human Feedback

Cornell-AGI 's models 20

Cornell-AGI/apo_math_qwen2.5_1.5b

Text Generation • 2B • Updated May 5, 2025 • 1

Cornell-AGI/ppo_math_qwen2.5_1.5b

Text Generation • 2B • Updated May 5, 2025 • 1

Cornell-AGI/rebel_math_qwen2.5_1.5b

Text Generation • 2B • Updated May 5, 2025 • 1

Cornell-AGI/grpo_math_qwen2.5_3b

Text Generation • 3B • Updated May 5, 2025

Cornell-AGI/grpo_math_qwen2.5_1.5b

Text Generation • 2B • Updated May 5, 2025

Cornell-AGI/ppo_math_qwen2.5_3b

Text Generation • 3B • Updated May 5, 2025 • 1

Cornell-AGI/rebel_math_qwen2.5_3b

Text Generation • 3B • Updated May 5, 2025

Cornell-AGI/apo_math_qwen2.5_3b

Text Generation • 3B • Updated May 5, 2025

Cornell-AGI/grpo_math_qwen2.5_7b

Text Generation • 8B • Updated May 5, 2025

Cornell-AGI/ppo_math_qwen2.5_7b

Text Generation • 8B • Updated May 5, 2025

Cornell-AGI/rebel_math_qwen2.5_7b

Text Generation • 8B • Updated May 4, 2025

Cornell-AGI/apo_math_qwen2.5_7b

Text Generation • 8B • Updated May 4, 2025 • 1

Cornell-AGI/REFUEL-Llama-3-Armo-iter_2

8B • Updated Oct 8, 2024 • 2

Cornell-AGI/REFUEL-Llama-3-Armo-iter_1

8B • Updated Oct 8, 2024 • 1

Cornell-AGI/REBEL-Llama-3-Armo-iter_3

8B • Updated Sep 2, 2024 • 2 • 2

Cornell-AGI/REBEL-Llama-3-Armo-iter_2

8B • Updated Sep 2, 2024 • 7 • 1

Cornell-AGI/REBEL-Llama-3-Armo-iter_1

8B • Updated Sep 2, 2024 • 2 • 1

Cornell-AGI/REBEL-Llama-3-epoch_2

Text Generation • Updated Sep 1, 2024 • 3 • 3

Cornell-AGI/REBEL-Llama-3

Text Generation • Updated Sep 1, 2024 • 4 • 1

Cornell-AGI/REBEL-OpenChat-3.5

Text Generation • Updated Sep 1, 2024 • 7 • 1