microsoft
/

GUI-Actor-7B-Qwen2-VL

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions

qianhuiwu commited on Jun 9

Commit

da4710d

·

verified ·

1 Parent(s): 496c8ef

add dataset link.

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ pipeline_tag: image-text-to-text
 # GUI-Actor-7B with Qwen2-VL-7B as backbone VLM
 This model was introduced in the paper [**GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents**](https://huggingface.co/papers/2506.03143).
-It is developed based on [Qwen2-VL-7B-Instruct ](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct), augmented by an attention-based action head and finetuned to perform GUI grounding using the dataset [here (coming soon)]().
 For more details on model design and evaluation, please check: [🏠 Project Page](https://microsoft.github.io/GUI-Actor/) | [💻 Github Repo](https://github.com/microsoft/GUI-Actor) | [📑 Paper](https://www.arxiv.org/pdf/2506.03143).

 # GUI-Actor-7B with Qwen2-VL-7B as backbone VLM
 This model was introduced in the paper [**GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents**](https://huggingface.co/papers/2506.03143).
+It is developed based on [Qwen2-VL-7B-Instruct ](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct), augmented by an attention-based action head and finetuned to perform GUI grounding using the dataset [here](https://huggingface.co/datasets/cckevinn/GUI-Actor-Data).
 For more details on model design and evaluation, please check: [🏠 Project Page](https://microsoft.github.io/GUI-Actor/) | [💻 Github Repo](https://github.com/microsoft/GUI-Actor) | [📑 Paper](https://www.arxiv.org/pdf/2506.03143).