Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding Paper • 2512.17532 • Published 7 days ago • 62
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper • 2512.20557 • Published 3 days ago • 43
V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models Paper • 2511.16668 • Published Nov 20 • 53
Perflow-Shuai/streaming_vlm_e1_lr2e-5_dt_rebuttal_stage2_ps512_pw512_from_qwen_run2-checkpoint-42-model 8B • Updated Nov 18 • 4
Perflow-Shuai/streaming_vlm_e1_lr2e-5_dt_rebuttal_stage2_ps512_pw512_from_qwen_run2-checkpoint-42-model 8B • Updated Nov 18 • 4
LongAI Collection Boost AI's Long ability, while keeping Efficient. Models in this collection includes LongVILA, LongVILA-R1, LongLive. • 8 items • Updated Nov 6 • 2
LongAI Collection Boost AI's Long ability, while keeping Efficient. Models in this collection includes LongVILA, LongVILA-R1, LongLive. • 8 items • Updated Nov 6 • 2
UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback Paper • 2511.01678 • Published Nov 3 • 35
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published Oct 27 • 177
VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting Paper • 2510.21817 • Published Oct 21 • 41
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published Oct 13 • 176
VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking Paper • 2303.11301 • Published Mar 20, 2023
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks Paper • 2401.14159 • Published Jan 25, 2024 • 6
FocalFormer3D : Focusing on Hard Instance for 3D Object Detection Paper • 2308.04556 • Published Aug 8, 2023 • 9