From Pixels to Words -- Towards Native Vision-Language Primitives at Scale Paper • 2510.14979 • Published 12 days ago • 65
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation Paper • 2504.02782 • Published Apr 3 • 57
Position: Interactive Generative Video as Next-Generation Game Engine Paper • 2503.17359 • Published Mar 21 • 61
EVEv2: Improved Baselines for Encoder-Free Vision-Language Models Paper • 2502.06788 • Published Feb 10 • 13
Autoregressive Video Generation without Vector Quantization Paper • 2412.14169 • Published Dec 18, 2024 • 14