arxiv:2503.21696
Wenqi Zhang
zwq2018
AI & ML interests
LLM, Multimodal, Robotics
Recent Activity
upvoted
a
paper
about 3 hours ago
VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing,
Speaking, and Acting
liked
a dataset
18 days ago
hongxingli/SPBench
upvoted
a
paper
18 days ago
SpatialLadder: Progressive Training for Spatial Reasoning in
Vision-Language Models