Mikhail Terekhov's picture

1 5

Mikhail Terekhov

terekhov

·

MikhailTerekhov

AI & ML interests

Reinforcement Learning, Multi-objective Reinforcement Learning, RLHF

Organizations

authored 3 papers 3 months ago

Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders

Paper • 2410.22366 • Published Oct 28, 2024 • 84

Control Tax: The Price of Keeping AI in Check

Paper • 2506.05296 • Published Jun 5

Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols

Paper • 2510.09462 • Published Oct 10 • 5