rr123

raymond1113

AI & ML interests

None yet

Recent Activity

upvoted a paper about 1 month ago

Scaling Agents via Continual Pre-training

upvoted a paper about 1 month ago

WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

upvoted a paper about 1 month ago

Towards General Agentic Intelligence via Environment Scaling

View all activity

Organizations

None yet

upvoted 5 papers about 1 month ago

Scaling Agents via Continual Pre-training

Paper • 2509.13310 • Published Sep 16 • 111

WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

Paper • 2509.13305 • Published Sep 16 • 88

Towards General Agentic Intelligence via Environment Scaling

Paper • 2509.13311 • Published Sep 16 • 70

WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research

Paper • 2509.13312 • Published Sep 16 • 104

WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents

Paper • 2509.13309 • Published Sep 16 • 67

upvoted a collection 3 months ago

Qwen3

Collection

84 items • Updated Aug 6 • 1.37k

upvoted a paper 4 months ago

WebSailor: Navigating Super-human Reasoning for Web Agent

Paper • 2507.02592 • Published Jul 3 • 120

upvoted a collection 8 months ago

DeepSeek-R1

Collection

10 items • Updated May 29 • 807

upvoted an article 8 months ago

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Dec 9, 2022

• 363

upvoted a paper 9 months ago

Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament

Paper • 2501.13007 • Published Jan 22 • 20

upvoted a paper about 1 year ago

RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style

Paper • 2410.16184 • Published Oct 21, 2024 • 25

rr123

AI & ML interests

Recent Activity

Organizations

raymond1113's activity

Illustrating Reinforcement Learning from Human Feedback (RLHF)