UniVideo: Unified Understanding, Generation, and Editing for Videos Paper • 2510.08377 • Published 18 days ago • 67
Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks Paper • 2510.02286 • Published 25 days ago • 28
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model Paper • 2509.00676 • Published Aug 31 • 83
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs Paper • 2506.14245 • Published Jun 17 • 42
AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy Paper • 2506.13284 • Published Jun 16 • 26
AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy Paper • 2506.13284 • Published Jun 16 • 26 • 4
AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy Paper • 2506.13284 • Published Jun 16 • 26
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation Paper • 2506.06962 • Published Jun 8 • 28
LaTtE-Flow: Layerwise Timestep-Expert Flow-based Transformer Paper • 2506.06952 • Published Jun 8 • 9