DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models Paper • 2512.15713 • Published 7 days ago • 15
VTP Collection Towards Scalable Pre-training of Visual Tokenizers for Generation • 4 items • Updated 8 days ago • 39
Towards Scalable Pre-training of Visual Tokenizers for Generation Paper • 2512.13687 • Published 9 days ago • 93
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models Paper • 2512.02014 • Published 23 days ago • 66
Seedream 4.0: Toward Next-generation Multimodal Image Generation Paper • 2509.20427 • Published Sep 24 • 81
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published Jun 16 • 273
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder Paper • 2505.07916 • Published May 12 • 134
PixelHacker: Image Inpainting with Structural and Semantic Consistency Paper • 2504.20438 • Published Apr 29 • 44
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math Paper • 2504.21233 • Published Apr 30 • 49
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation Paper • 2504.12626 • Published Apr 17 • 51
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers Paper • 2504.10483 • Published Apr 14 • 21
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper • 2504.06263 • Published Apr 8 • 182
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published Mar 31 • 301