readme cleanup
Browse files
README.md
CHANGED
|
@@ -7,97 +7,10 @@ license: mit
|
|
| 7 |
This demo illustrates the work published in the paper ["Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models"](https://arxiv.org/pdf/2402.07865.pdf)
|
| 8 |
|
| 9 |
|
| 10 |
-
#
|
|
|
|
|
|
|
| 11 |
|
| 12 |
> *VLM Demo*: Lightweight repo for chatting with VLMs supported by our
|
| 13 |
[VLM Evaluation Suite](https://github.com/TRI-ML/vlm-evaluation/tree/main).
|
| 14 |
|
| 15 |
-
---
|
| 16 |
-
|
| 17 |
-
## Installation
|
| 18 |
-
|
| 19 |
-
This repository can be installed as follows:
|
| 20 |
-
|
| 21 |
-
```bash
|
| 22 |
-
git clone git@github.com:TRI-ML/vlm-demo.git
|
| 23 |
-
cd vlm-demo
|
| 24 |
-
pip install -e .
|
| 25 |
-
```
|
| 26 |
-
|
| 27 |
-
This repository also requires that the `vlm-evaluation` package (`vlm_eval`) is
|
| 28 |
-
installed in the current environment. Installation instructions can be found
|
| 29 |
-
[here](https://github.com/TRI-ML/vlm-evaluation/tree/main).
|
| 30 |
-
|
| 31 |
-
## Usage
|
| 32 |
-
|
| 33 |
-
The main script to run is `interactive_demo.py`, while the implementation of
|
| 34 |
-
the Gradio Controller (`serve/gradio_controller.py`) and Gradio Web Server
|
| 35 |
-
(`serve/gradio_web_server.py`) are within `serve`. All of this code is heavily
|
| 36 |
-
adapted from the [LLaVA Github Repo](https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/).
|
| 37 |
-
More details on how this code was modified from the original LLaVA repo is provided in the
|
| 38 |
-
relevant source files.
|
| 39 |
-
|
| 40 |
-
To run the demo, first run the following commands in separate terminals:
|
| 41 |
-
|
| 42 |
-
+ Start Gradio Controller: `python -m serve.controller --host 0.0.0.0 --port 10000`
|
| 43 |
-
+ Start Gradio Web Server: `python -m serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --share`
|
| 44 |
-
|
| 45 |
-
To run the interactive demo, you can specify a model to chat with via a `model_dir` or `model_id` as follows
|
| 46 |
-
|
| 47 |
-
+ `python -m interactive_demo --port 40000 --model_id <MODEL_ID>` OR
|
| 48 |
-
+ `python -m interactive_demo --port 40000 --model_dir <MODEL_DIR>`
|
| 49 |
-
|
| 50 |
-
If you want to chat with multiple models simultaneously, you can launch the `interactive_demo` script in different terminals.
|
| 51 |
-
|
| 52 |
-
When running the demo, the following parameters are adjustable:
|
| 53 |
-
+ Temperature
|
| 54 |
-
+ Max output tokens
|
| 55 |
-
|
| 56 |
-
The default interaction mode is Chat, which is the main way to use our models. However, we also support a number of other
|
| 57 |
-
interaction modes for more specific use cases:
|
| 58 |
-
+ Captioning: Here,you can simply upload an image with no provided prompt and the selected model will output a caption. Even if a prompt
|
| 59 |
-
is input by the user, it will not be used in producing the caption.
|
| 60 |
-
+ Bounding Box Prediction: After uploading an image, simply specify a portion of the image for which bounding box coordinates are desired
|
| 61 |
-
in the prompt and the selected model will output corresponding coordinates.
|
| 62 |
-
+ Visual Question Answering: Selecting this option is best when the user wants short, succint answers to a specific question provided in the
|
| 63 |
-
prompt.
|
| 64 |
-
+ True/False Question Answering: Selecting this option is best when the user wants a True/False answer to a specific question provided in the
|
| 65 |
-
prompt.
|
| 66 |
-
|
| 67 |
-
## Example
|
| 68 |
-
|
| 69 |
-
To chat with the LLaVa 1.5 (7B) and Prism 7B models in an interactive GUI, run the following scripts in separate terminals.
|
| 70 |
-
|
| 71 |
-
Launch gradio controller:
|
| 72 |
-
|
| 73 |
-
`python -m serve.controller --host 0.0.0.0 --port 10000`
|
| 74 |
-
|
| 75 |
-
Launch web server:
|
| 76 |
-
|
| 77 |
-
`python -m serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --share`
|
| 78 |
-
|
| 79 |
-
Now we can launch an interactive demo corresponding to each of the models we want to chat with. For Prism models, you
|
| 80 |
-
onl need to specify a `model_id`, while for LLaVA and InstructBLIP, you need to additionally specifiy a `model_family`
|
| 81 |
-
and `model_dir`. Note that for each model, a different port must be specified.
|
| 82 |
-
|
| 83 |
-
Launch interactive demo for Prism 7B Model:
|
| 84 |
-
|
| 85 |
-
`python -m interactive_demo --port 40000 --model_id prism-dinosiglip+7b`
|
| 86 |
-
|
| 87 |
-
Launch interactive demo for LLaVA 1.5 7B Model:
|
| 88 |
-
|
| 89 |
-
`python -m interactive_demo --port 40001 --model_family llava-v15 --model_id llava-v1.5-7b --model_dir liuhaotian/llava-v1.5-7b`
|
| 90 |
-
|
| 91 |
-
## Contributing
|
| 92 |
-
|
| 93 |
-
Before committing to the repository, *make sure to set up your dev environment!*
|
| 94 |
-
|
| 95 |
-
Here are the basic development environment setup guidelines:
|
| 96 |
-
|
| 97 |
-
+ Fork/clone the repository, performing an editable installation. Make sure to install with the development dependencies
|
| 98 |
-
(e.g., `pip install -e ".[dev]"`); this will install `black`, `ruff`, and `pre-commit`.
|
| 99 |
-
|
| 100 |
-
+ Install `pre-commit` hooks (`pre-commit install`).
|
| 101 |
-
|
| 102 |
-
+ Branch for the specific feature/issue, issuing PR against the upstream repository for review.
|
| 103 |
-
|
|
|
|
| 7 |
This demo illustrates the work published in the paper ["Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models"](https://arxiv.org/pdf/2402.07865.pdf)
|
| 8 |
|
| 9 |
|
| 10 |
+
# Source code
|
| 11 |
+
|
| 12 |
+
For more information, please refer to this repository:
|
| 13 |
|
| 14 |
> *VLM Demo*: Lightweight repo for chatting with VLMs supported by our
|
| 15 |
[VLM Evaluation Suite](https://github.com/TRI-ML/vlm-evaluation/tree/main).
|
| 16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|