InternRobotics
/

VLAC

@@ -27,9 +27,9 @@ tags:
 </div>
-<div align="center">
   <img src="https://github.com/InternRobotics/VLAC/tree/main/data/title_banner-2.gif" alt="VLAC banner" width="800"></img>
-</div>
 ## VLAC-2B
@@ -51,15 +51,15 @@ VLAC trained on 3000h+ human egocentric data, 1200h+ comprehensive public roboti
 • **Trajectory quality screening** - VLAC can evaluate the collected trajectories and filters out low score trajectories based on the VOC value and mask the action with negative pair-wise score, that is, data with low fluency and quality, improving the effect and efficiency of imitation learning.
-## Framework
 <div align="center">
-  <img src="https://github.com/InternRobotics/VLAC/tree/main/data/framework.png" alt="VLAC Framework" width="800"/>
 </div>
-*The VLAC model is trained on a combination of comprehensive public robotic manipulation datasets, human demonstration data, self-collected manipulation data, and various image understanding datasets. Video data is processed into pair-wise samples to learn the different task progress between any two frames, supplemented with task descriptions and task completion evaluation to enable task progress understanding and action generation, as illustrated in the bottom-left corner. As shown in the diagram on the right, the model demonstrates strong generalization capabilities to new robots, scenarios, and tasks not covered in the training dataset. It can predict task progress and distinguish failure action or trajectory, providing dense reward feedback for real-world reinforcement learning and offering guidance for data refinement. Additionally, the model can directly perform manipulation tasks, exhibiting zero-shot capabilities to handle different scenarios.*
-## Performance
 Details about the model's performance and evaluation metrics can be found in the [Homepage](https://vlac.intern-ai.org.cn/).

 </div>
+<!-- <div align="center">
   <img src="https://github.com/InternRobotics/VLAC/tree/main/data/title_banner-2.gif" alt="VLAC banner" width="800"></img>
+</div> -->
 ## VLAC-2B
 • **Trajectory quality screening** - VLAC can evaluate the collected trajectories and filters out low score trajectories based on the VOC value and mask the action with negative pair-wise score, that is, data with low fluency and quality, improving the effect and efficiency of imitation learning.
+<!-- ## Framework
 <div align="center">
+  <img src="https://github.com/InternRobotics/VLAC/blob/main/data/framework.png" alt="VLAC Framework" width="800"/>
 </div>
+*The VLAC model is trained on a combination of comprehensive public robotic manipulation datasets, human demonstration data, self-collected manipulation data, and various image understanding datasets. Video data is processed into pair-wise samples to learn the different task progress between any two frames, supplemented with task descriptions and task completion evaluation to enable task progress understanding and action generation, as illustrated in the bottom-left corner. As shown in the diagram on the right, the model demonstrates strong generalization capabilities to new robots, scenarios, and tasks not covered in the training dataset. It can predict task progress and distinguish failure action or trajectory, providing dense reward feedback for real-world reinforcement learning and offering guidance for data refinement. Additionally, the model can directly perform manipulation tasks, exhibiting zero-shot capabilities to handle different scenarios.* -->
+## Framework & Performance
 Details about the model's performance and evaluation metrics can be found in the [Homepage](https://vlac.intern-ai.org.cn/).