Spaces:
Sleeping
Sleeping
File size: 4,415 Bytes
25ea14a 8b72e45 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
---
title: Token Attention Viewer
emoji: 📈
colorFrom: gray
colorTo: pink
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: mit
short_description: Interactive visualization of attention weights in LLMs word-
---
# Token-Attention-Viewer
Token Attention Viewer is an interactive Gradio app that visualizes the self-attention weights inside transformer language models for every generated token. It helps researchers, students, and developers explore how models like GPT-2 or LLaMA focus on different parts of the input as they generate text.
# Word-Level Attention Visualizer (Gradio)
An interactive Gradio app to **generate text with a causal language model** and **visualize attention word-by-word**.
Each word in the generated continuation is shown like a paragraph; the **background opacity** behind a word reflects the **sum of attention weights** that the selected (query) word assigns to the context. You can also switch between many popular Hugging Face models.
---
## ✨ What the app does
* **Generate** a continuation from your prompt using a selected causal LM (GPT-2, OPT, Mistral, etc.).
* **Select a generated word** to inspect.
* **Visualize attention** as a semi-transparent background behind words (no plots/libraries like matplotlib).
* **Mean across layers/heads** or inspect a specific layer/head.
* **Proper detokenization** to real words (regex-based) and **EOS tokens are stripped** (no `<|endoftext|>` clutter).
* **Paragraph wrapping**: words wrap to new lines automatically inside the box.
---
## 🚀 Quickstart
### 1) Clone
```bash
git clone https://github.com/devMuniz02/Token-Attention-Viewer
cd Token-Attention-Viewer
```
### 2) (Optional) Create a virtual environment
**Windows (PowerShell):**
```powershell
python -m venv venv
.\venv\Scripts\Activate.ps1
```
**macOS / Linux (bash/zsh):**
```bash
python3 -m venv venv
source venv/bin/activate
```
### 3) Install requirements
Install:
```bash
pip install -r requirements.txt
```
### 4) Run the app
```bash
python app.py
```
You should see Gradio report a local URL similar to:
```
Running on local URL: http://127.0.0.1:7860
```
### 5) Open in your browser
Open the printed URL (default `http://127.0.0.1:7860`) in your browser.
---
## 🧭 How to use
1. **Model**: pick a model from the dropdown and click **Load / Switch Model**.
* Small models (e.g., `distilgpt2`, `gpt2`) run on CPU.
* Larger models (e.g., `mistralai/Mistral-7B-v0.1`) generally need a GPU with enough VRAM.
2. **Prompt**: enter your starting text.
3. **Generate**: click **Generate** to produce a continuation.
4. **Inspect**: select any **generated word** (radio buttons).
* The paragraph box highlights where that word attends.
* Toggle **Mean Across Layers/Heads** or choose a specific **layer/head**.
5. Repeat with different models or prompts.
---
## 🧩 Files
* `app.py` — Gradio application (UI + model loading + attention visualization).
* `requirements.txt` — Python dependencies (see above).
* `README.md` — this file.
---
## 🛠️ Troubleshooting
* **Radio/choices error**: If you switch models and see a Gradio “value not in choices” error, ensure the app resets the radio with `value=None` (the included code already does this).
* **`<|endoftext|>` shows up**: The app strips **trailing** special tokens from the generated segment, so EOS shouldn’t appear. If you still see it in the middle, your model truly generated it as a token.
* **OOM / model too large**:
* Try a smaller model (`distilgpt2`, `gpt2`, `facebook/opt-125m`).
* Reduce `Max New Tokens`.
* Use CPU for smaller models or a GPU with more VRAM for bigger ones.
* **Slow generation**: Smaller models or CPU mode will be slower; consider using GPU and the `accelerate` package.
* **Missing tokenizer pad token**: The app sets `pad_token_id = eos_token_id` automatically when needed.
---
## 🔒 Access-gated models
Some families (e.g., **LLaMA**, **Gemma**) require you to accept licenses or request access on Hugging Face. Make sure your Hugging Face account has access before trying to load those models.
---
## 📣 Acknowledgments
* Built with [Gradio](https://www.gradio.app/) and [Hugging Face Transformers](https://huggingface.co/docs/transformers).
* Attention visualization inspired by standard causal LM attention tensors available from `generate(output_attentions=True)`. |