OpenAssistant/reward-model-deberta-v3-large-v2 Text Classification • Updated Feb 1, 2023 • 5.99k • • 240
nvidia/Nemotron-RL-instruction_following-structured_outputs Viewer • Updated 20 days ago • 9.95k • 407 • 25
instruction-pretrain/general-instruction-augmented-corpora Preview • Updated Mar 1, 2025 • 5.13k • 20
ByteDance-Seed/Seed-OSS-36B-Instruct Text Generation • 36B • Updated Aug 26, 2025 • 9.93k • 469
allenai/tulu-3-sft-personas-instruction-following Viewer • Updated Nov 21, 2024 • 30k • 2.1k • 60
view article Article Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment Feb 11, 2025 • 95