Revisiting Generalization Across Difficulty Levels: It's Not So Easy Paper • 2511.21692 • Published about 1 month ago • 15
MIB Datasets Collection The tasks and counterfactuals from the Mechanistic Interpretability Benchmark. • 7 items • Updated Apr 16 • 4
MIB Datasets Collection The tasks and counterfactuals from the Mechanistic Interpretability Benchmark. • 7 items • Updated Apr 16 • 4
Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions Paper • 2502.04322 • Published Feb 6 • 3