LLM in the Loop: Creating the PARADEHATE Dataset for Hate Speech Detoxification Paper • 2506.01484 • Published Jun 2, 2025 • 6
WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks Paper • 2506.01952 • Published Jun 2, 2025 • 10
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards Paper • 2505.24760 • Published May 30, 2025 • 74