Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper • 2410.22366 • Published Oct 28, 2024 • 84
Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols Paper • 2510.09462 • Published Oct 10 • 5