Kimi Linear: An Expressive, Efficient Attention Architecture Paper • 2510.26692 • Published 3 days ago • 63
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published 20 days ago • 160
DINOv3 Collection DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 • 13 items • Updated Aug 21 • 365
SpatialVID: A Large-Scale Video Dataset with Spatial Annotations Paper • 2509.09676 • Published Sep 11 • 31
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10 • 185
CineScale: Free Lunch in High-Resolution Cinematic Visual Generation Paper • 2508.15774 • Published Aug 21 • 20
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25 • 202
Lumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative Models Paper • 2508.12945 • Published Aug 18 • 14
Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model Paper • 2508.13009 • Published Aug 18 • 25
Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off Paper • 2508.04825 • Published Aug 6 • 57
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale Paper • 2508.10711 • Published Aug 14 • 142
Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control Paper • 2508.08134 • Published Aug 11 • 10
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models Paper • 2508.06471 • Published Aug 8 • 188