manu02 commited on
Commit
25ea14a
·
verified ·
1 Parent(s): 8b72e45

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +134 -121
README.md CHANGED
@@ -1,122 +1,135 @@
1
- # Token-Attention-Viewer
2
- Token Attention Viewer is an interactive Gradio app that visualizes the self-attention weights inside transformer language models for every generated token. It helps researchers, students, and developers explore how models like GPT-2 or LLaMA focus on different parts of the input as they generate text.
3
-
4
- # Word-Level Attention Visualizer (Gradio)
5
-
6
- An interactive Gradio app to **generate text with a causal language model** and **visualize attention word-by-word**.
7
- Each word in the generated continuation is shown like a paragraph; the **background opacity** behind a word reflects the **sum of attention weights** that the selected (query) word assigns to the context. You can also switch between many popular Hugging Face models.
8
-
9
- ---
10
-
11
- ## What the app does
12
-
13
- * **Generate** a continuation from your prompt using a selected causal LM (GPT-2, OPT, Mistral, etc.).
14
- * **Select a generated word** to inspect.
15
- * **Visualize attention** as a semi-transparent background behind words (no plots/libraries like matplotlib).
16
- * **Mean across layers/heads** or inspect a specific layer/head.
17
- * **Proper detokenization** to real words (regex-based) and **EOS tokens are stripped** (no `<|endoftext|>` clutter).
18
- * **Paragraph wrapping**: words wrap to new lines automatically inside the box.
19
-
20
- ---
21
-
22
- ## 🚀 Quickstart
23
-
24
- ### 1) Clone
25
-
26
- ```bash
27
- git clone https://github.com/devMuniz02/Token-Attention-Viewer
28
- cd Token-Attention-Viewer
29
- ```
30
-
31
- ### 2) (Optional) Create a virtual environment
32
-
33
- **Windows (PowerShell):**
34
-
35
- ```powershell
36
- python -m venv venv
37
- .\venv\Scripts\Activate.ps1
38
- ```
39
-
40
- **macOS / Linux (bash/zsh):**
41
-
42
- ```bash
43
- python3 -m venv venv
44
- source venv/bin/activate
45
- ```
46
-
47
- ### 3) Install requirements
48
-
49
- Install:
50
-
51
- ```bash
52
- pip install -r requirements.txt
53
- ```
54
-
55
-
56
- ### 4) Run the app
57
-
58
- ```bash
59
- python app.py
60
- ```
61
-
62
- You should see Gradio report a local URL similar to:
63
-
64
- ```
65
- Running on local URL: http://127.0.0.1:7860
66
- ```
67
-
68
- ### 5) Open in your browser
69
-
70
- Open the printed URL (default `http://127.0.0.1:7860`) in your browser.
71
-
72
- ---
73
-
74
- ## 🧭 How to use
75
-
76
- 1. **Model**: pick a model from the dropdown and click **Load / Switch Model**.
77
-
78
- * Small models (e.g., `distilgpt2`, `gpt2`) run on CPU.
79
- * Larger models (e.g., `mistralai/Mistral-7B-v0.1`) generally need a GPU with enough VRAM.
80
- 2. **Prompt**: enter your starting text.
81
- 3. **Generate**: click **Generate** to produce a continuation.
82
- 4. **Inspect**: select any **generated word** (radio buttons).
83
-
84
- * The paragraph box highlights where that word attends.
85
- * Toggle **Mean Across Layers/Heads** or choose a specific **layer/head**.
86
- 5. Repeat with different models or prompts.
87
-
88
- ---
89
-
90
- ## 🧩 Files
91
-
92
- * `app.py` Gradio application (UI + model loading + attention visualization).
93
- * `requirements.txt` Python dependencies (see above).
94
- * `README.md` this file.
95
-
96
- ---
97
-
98
- ## 🛠️ Troubleshooting
99
-
100
- * **Radio/choices error**: If you switch models and see a Gradio “value not in choices” error, ensure the app resets the radio with `value=None` (the included code already does this).
101
- * **`<|endoftext|>` shows up**: The app strips **trailing** special tokens from the generated segment, so EOS shouldn’t appear. If you still see it in the middle, your model truly generated it as a token.
102
- * **OOM / model too large**:
103
-
104
- * Try a smaller model (`distilgpt2`, `gpt2`, `facebook/opt-125m`).
105
- * Reduce `Max New Tokens`.
106
- * Use CPU for smaller models or a GPU with more VRAM for bigger ones.
107
- * **Slow generation**: Smaller models or CPU mode will be slower; consider using GPU and the `accelerate` package.
108
- * **Missing tokenizer pad token**: The app sets `pad_token_id = eos_token_id` automatically when needed.
109
-
110
- ---
111
-
112
- ## 🔒 Access-gated models
113
-
114
- Some families (e.g., **LLaMA**, **Gemma**) require you to accept licenses or request access on Hugging Face. Make sure your Hugging Face account has access before trying to load those models.
115
-
116
- ---
117
-
118
-
119
- ## 📣 Acknowledgments
120
-
121
- * Built with [Gradio](https://www.gradio.app/) and [Hugging Face Transformers](https://huggingface.co/docs/transformers).
 
 
 
 
 
 
 
 
 
 
 
 
 
122
  * Attention visualization inspired by standard causal LM attention tensors available from `generate(output_attentions=True)`.
 
1
+ ---
2
+ title: Token Attention Viewer
3
+ emoji: 📈
4
+ colorFrom: gray
5
+ colorTo: pink
6
+ sdk: gradio
7
+ sdk_version: 5.49.1
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ short_description: Interactive visualization of attention weights in LLMs word-
12
+ ---
13
+
14
+ # Token-Attention-Viewer
15
+ Token Attention Viewer is an interactive Gradio app that visualizes the self-attention weights inside transformer language models for every generated token. It helps researchers, students, and developers explore how models like GPT-2 or LLaMA focus on different parts of the input as they generate text.
16
+
17
+ # Word-Level Attention Visualizer (Gradio)
18
+
19
+ An interactive Gradio app to **generate text with a causal language model** and **visualize attention word-by-word**.
20
+ Each word in the generated continuation is shown like a paragraph; the **background opacity** behind a word reflects the **sum of attention weights** that the selected (query) word assigns to the context. You can also switch between many popular Hugging Face models.
21
+
22
+ ---
23
+
24
+ ## What the app does
25
+
26
+ * **Generate** a continuation from your prompt using a selected causal LM (GPT-2, OPT, Mistral, etc.).
27
+ * **Select a generated word** to inspect.
28
+ * **Visualize attention** as a semi-transparent background behind words (no plots/libraries like matplotlib).
29
+ * **Mean across layers/heads** or inspect a specific layer/head.
30
+ * **Proper detokenization** to real words (regex-based) and **EOS tokens are stripped** (no `<|endoftext|>` clutter).
31
+ * **Paragraph wrapping**: words wrap to new lines automatically inside the box.
32
+
33
+ ---
34
+
35
+ ## 🚀 Quickstart
36
+
37
+ ### 1) Clone
38
+
39
+ ```bash
40
+ git clone https://github.com/devMuniz02/Token-Attention-Viewer
41
+ cd Token-Attention-Viewer
42
+ ```
43
+
44
+ ### 2) (Optional) Create a virtual environment
45
+
46
+ **Windows (PowerShell):**
47
+
48
+ ```powershell
49
+ python -m venv venv
50
+ .\venv\Scripts\Activate.ps1
51
+ ```
52
+
53
+ **macOS / Linux (bash/zsh):**
54
+
55
+ ```bash
56
+ python3 -m venv venv
57
+ source venv/bin/activate
58
+ ```
59
+
60
+ ### 3) Install requirements
61
+
62
+ Install:
63
+
64
+ ```bash
65
+ pip install -r requirements.txt
66
+ ```
67
+
68
+
69
+ ### 4) Run the app
70
+
71
+ ```bash
72
+ python app.py
73
+ ```
74
+
75
+ You should see Gradio report a local URL similar to:
76
+
77
+ ```
78
+ Running on local URL: http://127.0.0.1:7860
79
+ ```
80
+
81
+ ### 5) Open in your browser
82
+
83
+ Open the printed URL (default `http://127.0.0.1:7860`) in your browser.
84
+
85
+ ---
86
+
87
+ ## 🧭 How to use
88
+
89
+ 1. **Model**: pick a model from the dropdown and click **Load / Switch Model**.
90
+
91
+ * Small models (e.g., `distilgpt2`, `gpt2`) run on CPU.
92
+ * Larger models (e.g., `mistralai/Mistral-7B-v0.1`) generally need a GPU with enough VRAM.
93
+ 2. **Prompt**: enter your starting text.
94
+ 3. **Generate**: click **Generate** to produce a continuation.
95
+ 4. **Inspect**: select any **generated word** (radio buttons).
96
+
97
+ * The paragraph box highlights where that word attends.
98
+ * Toggle **Mean Across Layers/Heads** or choose a specific **layer/head**.
99
+ 5. Repeat with different models or prompts.
100
+
101
+ ---
102
+
103
+ ## 🧩 Files
104
+
105
+ * `app.py` Gradio application (UI + model loading + attention visualization).
106
+ * `requirements.txt` Python dependencies (see above).
107
+ * `README.md` this file.
108
+
109
+ ---
110
+
111
+ ## 🛠️ Troubleshooting
112
+
113
+ * **Radio/choices error**: If you switch models and see a Gradio “value not in choices” error, ensure the app resets the radio with `value=None` (the included code already does this).
114
+ * **`<|endoftext|>` shows up**: The app strips **trailing** special tokens from the generated segment, so EOS shouldn’t appear. If you still see it in the middle, your model truly generated it as a token.
115
+ * **OOM / model too large**:
116
+
117
+ * Try a smaller model (`distilgpt2`, `gpt2`, `facebook/opt-125m`).
118
+ * Reduce `Max New Tokens`.
119
+ * Use CPU for smaller models or a GPU with more VRAM for bigger ones.
120
+ * **Slow generation**: Smaller models or CPU mode will be slower; consider using GPU and the `accelerate` package.
121
+ * **Missing tokenizer pad token**: The app sets `pad_token_id = eos_token_id` automatically when needed.
122
+
123
+ ---
124
+
125
+ ## 🔒 Access-gated models
126
+
127
+ Some families (e.g., **LLaMA**, **Gemma**) require you to accept licenses or request access on Hugging Face. Make sure your Hugging Face account has access before trying to load those models.
128
+
129
+ ---
130
+
131
+
132
+ ## 📣 Acknowledgments
133
+
134
+ * Built with [Gradio](https://www.gradio.app/) and [Hugging Face Transformers](https://huggingface.co/docs/transformers).
135
  * Attention visualization inspired by standard causal LM attention tensors available from `generate(output_attentions=True)`.