Expertise need not monopolize: Action-Specialized Mixture of Experts for Vision-Language-Action Learning Paper • 2510.14300 • Published 14 days ago • 8
VER: Vision Expert Transformer for Robot Learning via Foundation Distillation and Dynamic Routing Paper • 2510.05213 • Published 23 days ago • 5
EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs Paper • 2509.09174 • Published Sep 11 • 57
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies Paper • 2508.20072 • Published Aug 27 • 30
HyCodePolicy: Hybrid Language Controllers for Multimodal Monitoring and Decision in Embodied Agents Paper • 2508.02629 • Published Aug 4 • 5
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text Paper • 2406.08418 • Published Jun 12, 2024 • 31
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots Paper • 2405.07990 • Published May 13, 2024 • 20