VAMOS: A Hierarchical Vision-Language-Action Model for Capab Collection This collection contains VLM planner checkpoints, affordance module checkpoints for Spot and HOUND, training datasets, and a demo • 7 items • Updated about 12 hours ago • 1
Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing Paper • 2510.19808 • Published 5 days ago • 23
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs Paper • 2510.18876 • Published 6 days ago • 35
Running on Zero 105 105 HunyuanWorld-Mirror 🌍 Universal 3D World Reconstruction with Any Prior Prompting