TimeViper Collection a hybrid vision-language model for long video understanding • 2 items • Updated Nov 23 • 1
Time-R1 Collection Time-R1: Post-Training Large Vision-Language Model for Temporal Video Grounding • 4 items • Updated Nov 23 • 3
TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding Paper • 2511.16595 • Published Nov 20 • 9
TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding Paper • 2511.16595 • Published Nov 20 • 9
TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding Paper • 2511.16595 • Published Nov 20 • 9 • 2
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding Paper • 2511.13026 • Published Nov 17 • 25
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding Paper • 2511.13026 • Published Nov 17 • 25 • 2
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding Paper • 2511.13026 • Published Nov 17 • 25
POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World Paper • 2403.05856 • Published Mar 9, 2024
Unveiling Visual Biases in Audio-Visual Localization Benchmarks Paper • 2409.06709 • Published Aug 25, 2024
SPAFormer: Sequential 3D Part Assembly with Transformers Paper • 2403.05874 • Published Mar 9, 2024 • 1
Do Egocentric Video-Language Models Truly Understand Hand-Object Interactions? Paper • 2405.17719 • Published May 28, 2024 • 1
Time-R1 Collection Time-R1: Post-Training Large Vision-Language Model for Temporal Video Grounding • 4 items • Updated Nov 23 • 3