| # Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders | |
| [](https://arxiv.org/abs/2410.22366) | |
| [](https://huggingface.co/spaces/surokpro2/Unboxing_SDXL_with_SAEs) | |
| [](https://colab.research.google.com/drive/1lWZ2yCRwCf4iuykvb-91QYUNkuzIwI3k?usp=sharing) | |
|  | |
| This repository contains code to reproduce results from our paper on using sparse autoencoders (SAEs) to analyze and interpret the internal representations of text-to-image diffusion models, specifically SDXL Turbo. | |
| ## Repository Structure | |
| ``` | |
| |-- SAE/ # Core sparse autoencoder implementation | |
| |-- SDLens/ # Tools for analyzing diffusion models | |
| | `-- hooked_sd_pipeline.py # Modified stable diffusion pipeline | |
| |-- scripts/ | |
| | |-- collect_latents_dataset.py # Generate training data | |
| | `-- train_sae.py # Train SAE models | |
| |-- utils/ | |
| | `-- hooks.py # Hook utility functions | |
| |-- checkpoints/ # Pretrained SAE model checkpoints | |
| |-- app.py # Demo application | |
| |-- app.ipynb # Interactive notebook demo | |
| |-- example.ipynb # Usage examples | |
| `-- requirements.txt # Python dependencies | |
| ``` | |
| ## Installation | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ## Demo Application | |
| You can try our gradio demo application (`app.ipynb`) to browse and experiment with 20K+ features of our trained SAEs out-of-the-box. You can find the same notebook on [Google Colab](https://colab.research.google.com/drive/1lWZ2yCRwCf4iuykvb-91QYUNkuzIwI3k?usp=sharing). | |
| ## Usage | |
| 1. Collect latent data from SDXL Turbo: | |
| ```bash | |
| python scripts/collect_latents_dataset.py --save_path={your_save_path} | |
| ``` | |
| 2. Train sparse autoencoders: | |
| 2.1. Insert the path of stored latents and directory to store checkpoints in `SAE/config.json` | |
| 2.2. Run the training script: | |
| ```bash | |
| python scripts/train_sae.py | |
| ``` | |
| ## Pretrained Models | |
| We provide pretrained SAE checkpoints for 4 key transformer blocks in SDXL Turbo's U-Net in the `checkpoints` folder. See `example.ipynb` for analysis examples and visualization of learned features. More pretrained SAEs with different parameters are accessible through [HuggingFace repo](https://huggingface.co/surokpro2/sdxl-saes/tree/main). | |
| ## Citation | |
| If you find this code useful in your research, please cite our paper: | |
| ```bibtex | |
| @misc{surkov2024unpackingsdxlturbointerpreting, | |
| title={Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders}, | |
| author={Viacheslav Surkov and Chris Wendler and Mikhail Terekhov and Justin Deschenaux and Robert West and Caglar Gulcehre}, | |
| year={2024}, | |
| eprint={2410.22366}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.LG}, | |
| url={https://arxiv.org/abs/2410.22366}, | |
| } | |
| ``` | |
| ## Acknowledgements | |
| The SAE component was implemented based on [`openai/sparse_autoencoder`](https://github.com/openai/sparse_autoencoder) repository. | |