UniMIC: Token-Based Multimodal Interactive Coding for Human-AI Collaboration Paper • 2509.22570 • Published Sep 26 • 3
UniVid: Unifying Vision Tasks with Pre-trained Video Generation Models Paper • 2509.21760 • Published Sep 26 • 14
UniVid: Unifying Vision Tasks with Pre-trained Video Generation Models Paper • 2509.21760 • Published Sep 26 • 14 • 2
CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error Scenarios Paper • 2506.13977 • Published Jun 11 • 10
VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning Paper • 2505.22019 • Published May 28 • 11
Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model Paper • 2504.05594 • Published Apr 8 • 11
Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model Paper • 2504.05594 • Published Apr 8 • 11 • 3
Long-Context Autoregressive Video Modeling with Next-Frame Prediction Paper • 2503.19325 • Published Mar 25 • 73
Edit Transfer: Learning Image Editing via Vision In-Context Relations Paper • 2503.13327 • Published Mar 17 • 29 • 8
Edit Transfer: Learning Image Editing via Vision In-Context Relations Paper • 2503.13327 • Published Mar 17 • 29 • 8
Edit Transfer: Learning Image Editing via Vision In-Context Relations Paper • 2503.13327 • Published Mar 17 • 29
Edit Transfer: Learning Image Editing via Vision In-Context Relations Paper • 2503.13327 • Published Mar 17 • 29
Edit Transfer: Learning Image Editing via Vision In-Context Relations Paper • 2503.13327 • Published Mar 17 • 29 • 8
Edit Transfer: Learning Image Editing via Vision In-Context Relations Paper • 2503.13327 • Published Mar 17 • 29 • 8