Cyril666/whisper-large-v3-encoder Automatic Speech Recognition • 0.6B • Updated 9 days ago • 135
Cyril666/whisper-large-v3-encoder Automatic Speech Recognition • 0.6B • Updated 9 days ago • 135
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models Paper • 2512.16561 • Published 15 days ago • 19
RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing Paper • 2512.16864 • Published 15 days ago • 10
ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement Paper • 2512.13303 • Published 18 days ago • 16
DAMO-NLP-SG/VideoLLaMA3-7B-Image Visual Question Answering • 8B • Updated Mar 20, 2025 • 324 • 10