Models used in CHARM: Calibrating Reward Models With Chatbot Arena Scores.
shawnxzhu
shawnxzhu
·
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
6 days ago
QueST: Incentivizing LLMs to Generate Difficult Problems
upvoted
a
paper
about 2 months ago
Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with
Adaptive Exploration
Organizations
None yet