Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols Paper • 2510.09462 • Published Oct 10, 2025 • 5
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM Paper • 2509.18058 • Published Sep 22, 2025 • 12
Provable Compositional Generalization for Object-Centric Learning Paper • 2310.05327 • Published Oct 9, 2023