Spaces:
Running
on
Zero
Running
on
Zero
Update app.py
Browse files
app.py
CHANGED
|
@@ -1,6 +1,3 @@
|
|
| 1 |
-
I'll create a chat application using the Qwen3-VL-4B-Instruct model that can handle both text and image inputs. This will be a multimodal chatbot that can analyze images and respond to questions about them.
|
| 2 |
-
|
| 3 |
-
```python
|
| 4 |
import gradio as gr
|
| 5 |
import torch
|
| 6 |
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
|
|
@@ -320,40 +317,4 @@ if __name__ == "__main__":
|
|
| 320 |
show_error=True,
|
| 321 |
share=False,
|
| 322 |
debug=True
|
| 323 |
-
)
|
| 324 |
-
```
|
| 325 |
-
|
| 326 |
-
Now let's create the requirements.txt file:
|
| 327 |
-
|
| 328 |
-
```
|
| 329 |
-
gradio
|
| 330 |
-
transformers
|
| 331 |
-
torch
|
| 332 |
-
torchvision
|
| 333 |
-
spaces
|
| 334 |
-
Pillow
|
| 335 |
-
numpy
|
| 336 |
-
accelerate
|
| 337 |
-
sentencepiece
|
| 338 |
-
einops
|
| 339 |
-
transformers_stream_generator
|
| 340 |
-
```
|
| 341 |
-
|
| 342 |
-
This application creates a multimodal chat interface with the following features:
|
| 343 |
-
|
| 344 |
-
1. **Multimodal Input**: Users can send text messages, images, or both
|
| 345 |
-
2. **Vision-Language Understanding**: The Qwen3-VL model can analyze images and answer questions about them
|
| 346 |
-
3. **Chat History**: Maintains conversation context
|
| 347 |
-
4. **Interactive Controls**: Retry, undo, and clear buttons for better user experience
|
| 348 |
-
5. **GPU Optimization**: Uses the @spaces.GPU decorator for efficient inference
|
| 349 |
-
6. **Clean UI**: Professional interface with helpful tips and examples
|
| 350 |
-
|
| 351 |
-
The app can:
|
| 352 |
-
- Describe images in detail
|
| 353 |
-
- Answer questions about image content
|
| 354 |
-
- Count objects in images
|
| 355 |
-
- Read text from images
|
| 356 |
-
- Discuss colors, composition, and mood
|
| 357 |
-
- Maintain conversational context
|
| 358 |
-
|
| 359 |
-
The interface is user-friendly with a clean design and provides guidance on how to use the multimodal capabilities effectively.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
import gradio as gr
|
| 2 |
import torch
|
| 3 |
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
|
|
|
|
| 317 |
show_error=True,
|
| 318 |
share=False,
|
| 319 |
debug=True
|
| 320 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|