Xuehai Pan's picture

3 2 3

Xuehai Pan

XuehaiPan

·

https://github.com/XuehaiPan

AI & ML interests

Reinforcement Learning & Multi-Agent Systems & Game Theory

Recent Activity

authored a paper about 1 month ago

AI Alignment: A Comprehensive Survey

authored a paper about 1 month ago

Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction

authored a paper about 1 month ago

Reward Generalization in RLHF: A Topological Perspective

View all activity

Organizations

authored 5 papers about 1 month ago

AI Alignment: A Comprehensive Survey

Paper • 2310.19852 • Published Oct 30, 2023

Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction

Paper • 2402.02416 • Published Feb 4, 2024 • 4

Reward Generalization in RLHF: A Topological Perspective

Paper • 2402.10184 • Published Feb 15, 2024

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published Jan 22, 2025 • 126

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Paper • 2512.02556 • Published Dec 2, 2025 • 244

updated 14 models over 1 year ago

PKU-Alignment/alpaca-8b-reproduced-llama-3

8B • Updated May 9, 2024 • 4.06k

PKU-Alignment/alpaca-7b-reproduced-llama-2

7B • Updated May 9, 2024 • 75 • 1

PKU-Alignment/beaver-7b-v3.0

Reinforcement Learning • 7B • Updated May 9, 2024 • 58

PKU-Alignment/beaver-7b-v2.0

Reinforcement Learning • 7B • Updated May 9, 2024 • 9

PKU-Alignment/beaver-7b-v1.0

Reinforcement Learning • 7B • Updated May 9, 2024 • 99 • 13

PKU-Alignment/alpaca-7b-reproduced

7B • Updated May 9, 2024 • 11.1k • 5

PKU-Alignment/beaver-7b-unified-reward

Reinforcement Learning • 7B • Updated Apr 20, 2024 • 535

PKU-Alignment/beaver-7b-unified-cost

Reinforcement Learning • 7B • Updated Apr 20, 2024 • 708 • 2

PKU-Alignment/beaver-7b-v3.0-reward

Reinforcement Learning • 7B • Updated Apr 20, 2024 • 57

PKU-Alignment/beaver-7b-v3.0-cost

Reinforcement Learning • 13B • Updated Apr 20, 2024 • 49

PKU-Alignment/beaver-7b-v2.0-reward

Reinforcement Learning • 7B • Updated Apr 20, 2024 • 15

PKU-Alignment/beaver-7b-v2.0-cost

Reinforcement Learning • 7B • Updated Apr 20, 2024 • 250

PKU-Alignment/beaver-7b-v1.0-reward

Reinforcement Learning • 7B • Updated Apr 20, 2024 • 2.47k • 17

PKU-Alignment/beaver-7b-v1.0-cost

Reinforcement Learning • 7B • Updated Apr 20, 2024 • 2.45k • 10

New activity in PKU-Alignment/beaver-7b-v1.0-reward over 1 year ago

Adding `safetensors` variant of this model

#2 opened over 1 year ago by