VIST3A: Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator Paper • 2510.13454 • Published 14 days ago • 6
Learning an Image Editing Model without Image Editing Pairs Paper • 2510.14978 • Published 13 days ago • 7
SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation Paper • 2505.19151 • Published May 25 • 2
Kimi-VL-A3B Collection Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 7 items • Updated 3 days ago • 75
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 152
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published Feb 20 • 104
Tulu 3 Datasets Collection All datasets released with Tulu 3 -- state of the art open post-training recipes. • 33 items • Updated Sep 18 • 94
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens Paper • 2410.13863 • Published Oct 17, 2024 • 38
Law of the Weakest Link: Cross Capabilities of Large Language Models Paper • 2409.19951 • Published Sep 30, 2024 • 54
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16, 2024 • 100