InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation

Teaser Image

Paper Code Data Website

InternVLA-A1 integrates understanding, generation, and action experts into a unified model, which synergizes MLLMs' semantic reasoning with world-model-style dynamics prediction to guide action execution.

Building upon InternVL3 and Qwen3-VL, we instantiate InternVLA-A1 at 2B and 3B parameter scales. Covering different model scales and pre-training data configurations, we release the InternVLA-A1 series:

πŸ”‘ Key Features

Architecturally, InternVLA-A1 employs a Mixture-of-Transformers (MoT) design to unify semantic un- derstanding, visual foresight, and action prediction, effectively synergizing high-level reasoning with low-level dynamics.

Teaser Image

Our hybrid synthetic-real pre-training strategy combines the scene diversity of simulation with the physical fidelity of real-world data.

Teaser Image

Demonstrations

⚑ Dynamic Manipulation

InternVLA-A1 exhibits exceptional robustness in highly dynamic scenarios.

πŸ€– Daily tasks

InternVLA-A1 also demonstrates superior proficiency in dexterous and fine-grained manipulation.

Usage

Please refer to our official repo InternVLA-A1.

License and Citation

All the code within this repo are under CC BY-NC-SA 4.0. Please consider citing our project if it helps your research.

@article{contributors2026internvla_a1,
  title={InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation},
  author={InternVLA-A1 contributors},
  journal={arXiv preprint arXiv:2601.02456},
  year={2026}
}

Acknowledgments

Downloads last month
34
Safetensors
Model size
3B params
Tensor type
I64
Β·
F32
Β·
BF16
Β·
Video Preview
loading

Model tree for InternRobotics/InternVLA-A1-3B

Finetuned
(95)
this model

Paper for InternRobotics/InternVLA-A1-3B