Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published 7 days ago • 77
JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization Paper • 2503.23377 • Published Mar 30 • 57