DragAnything: Motion Control for Anything using Entity Representation Paper • 2403.07420 • Published Mar 12, 2024 • 14
Learning Multi-dimensional Human Preference for Text-to-Image Generation Paper • 2405.14705 • Published May 23, 2024
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation Paper • 2406.10462 • Published Jun 15, 2024
Decouple Content and Motion for Conditional Image-to-Video Generation Paper • 2311.14294 • Published Nov 24, 2023
Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization Paper • 2502.01051 • Published Feb 3, 2025 • 1
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment Paper • 2502.10391 • Published Feb 14, 2025 • 34
Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning Paper • 2505.21067 • Published May 27, 2025 • 3
InstructEngine: Instruction-driven Text-to-Image Alignment Paper • 2504.10329 • Published Apr 14, 2025
TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types Paper • 2502.09925 • Published Feb 14, 2025
Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models Paper • 2504.08809 • Published Apr 9, 2025 • 1