Video Generation Models Are Good Latent Reward Models Paper • 2511.21541 • Published Nov 26, 2025 • 45 • 5
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Paper • 2511.22699 • Published Nov 27, 2025 • 224
Improving Fake News Detection by Using an Entity-enhanced Framework to Fuse Diverse Multimodal Clues Paper • 2108.10509 • Published Aug 24, 2021
Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models Paper • 2308.13437 • Published Aug 25, 2023 • 4
CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models Paper • 2402.13607 • Published Feb 21, 2024
Video Generation Models Are Good Latent Reward Models Paper • 2511.21541 • Published Nov 26, 2025 • 45
Video Generation Models Are Good Latent Reward Models Paper • 2511.21541 • Published Nov 26, 2025 • 45
UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions Paper • 2511.03334 • Published Nov 5, 2025 • 52
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics Paper • 2501.04686 • Published Jan 8, 2025 • 53