Offline Regularised Reinforcement Learning for Large Language Models Alignment Paper • 2405.19107 • Published May 29, 2024 • 15
Understanding the performance gap between online and offline alignment algorithms Paper • 2405.08448 • Published May 14, 2024 • 18