Spaces:

manu02
/

token-attention-viewer

Sleeping

File size: 4,415 Bytes

---
title: Token Attention Viewer
emoji: 📈
colorFrom: gray
colorTo: pink
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: mit
short_description: Interactive visualization of attention weights in LLMs word-
---

# Token-Attention-Viewer
Token Attention Viewer is an interactive Gradio app that visualizes the self-attention weights inside transformer language models for every generated token. It helps researchers, students, and developers explore how models like GPT-2 or LLaMA focus on different parts of the input as they generate text.

# Word-Level Attention Visualizer (Gradio)

An interactive Gradio app to **generate text with a causal language model** and **visualize attention word-by-word**.
Each word in the generated continuation is shown like a paragraph; the **background opacity** behind a word reflects the **sum of attention weights** that the selected (query) word assigns to the context. You can also switch between many popular Hugging Face models.

---

## ✨ What the app does

* **Generate** a continuation from your prompt using a selected causal LM (GPT-2, OPT, Mistral, etc.).
* **Select a generated word** to inspect.
* **Visualize attention** as a semi-transparent background behind words (no plots/libraries like matplotlib).
* **Mean across layers/heads** or inspect a specific layer/head.
* **Proper detokenization** to real words (regex-based) and **EOS tokens are stripped** (no `<|endoftext|>` clutter).
* **Paragraph wrapping**: words wrap to new lines automatically inside the box.

---

## 🚀 Quickstart

### 1) Clone

```bash
git clone https://github.com/devMuniz02/Token-Attention-Viewer
cd Token-Attention-Viewer
```

### 2) (Optional) Create a virtual environment

**Windows (PowerShell):**

```powershell
python -m venv venv
.\venv\Scripts\Activate.ps1
```

**macOS / Linux (bash/zsh):**

```bash
python3 -m venv venv
source venv/bin/activate
```

### 3) Install requirements

Install:

```bash
pip install -r requirements.txt
```


### 4) Run the app

```bash
python app.py
```

You should see Gradio report a local URL similar to:

```
Running on local URL:  http://127.0.0.1:7860
```

### 5) Open in your browser

Open the printed URL (default `http://127.0.0.1:7860`) in your browser.

---

## 🧭 How to use

1. **Model**: pick a model from the dropdown and click **Load / Switch Model**.

   * Small models (e.g., `distilgpt2`, `gpt2`) run on CPU.
   * Larger models (e.g., `mistralai/Mistral-7B-v0.1`) generally need a GPU with enough VRAM.
2. **Prompt**: enter your starting text.
3. **Generate**: click **Generate** to produce a continuation.
4. **Inspect**: select any **generated word** (radio buttons).

   * The paragraph box highlights where that word attends.
   * Toggle **Mean Across Layers/Heads** or choose a specific **layer/head**.
5. Repeat with different models or prompts.

---

## 🧩 Files

* `app.py` — Gradio application (UI + model loading + attention visualization).
* `requirements.txt` — Python dependencies (see above).
* `README.md` — this file.

---

## 🛠️ Troubleshooting

* **Radio/choices error**: If you switch models and see a Gradio “value not in choices” error, ensure the app resets the radio with `value=None` (the included code already does this).
* **`<|endoftext|>` shows up**: The app strips **trailing** special tokens from the generated segment, so EOS shouldn’t appear. If you still see it in the middle, your model truly generated it as a token.
* **OOM / model too large**:

  * Try a smaller model (`distilgpt2`, `gpt2`, `facebook/opt-125m`).
  * Reduce `Max New Tokens`.
  * Use CPU for smaller models or a GPU with more VRAM for bigger ones.
* **Slow generation**: Smaller models or CPU mode will be slower; consider using GPU and the `accelerate` package.
* **Missing tokenizer pad token**: The app sets `pad_token_id = eos_token_id` automatically when needed.

---

## 🔒 Access-gated models

Some families (e.g., **LLaMA**, **Gemma**) require you to accept licenses or request access on Hugging Face. Make sure your Hugging Face account has access before trying to load those models.

---


## 📣 Acknowledgments

* Built with [Gradio](https://www.gradio.app/) and [Hugging Face Transformers](https://huggingface.co/docs/transformers).
* Attention visualization inspired by standard causal LM attention tensors available from `generate(output_attentions=True)`.