Parallel Loop Transformer for Efficient Test-Time Computation Scaling Paper • 2510.24824 • Published 6 days ago • 13
LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation Paper • 2510.22946 • Published 7 days ago • 16
Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D Assets Paper • 2510.19944 • Published 12 days ago • 18
Trace Anything: Representing Any Video in 4D via Trajectory Fields Paper • 2510.13802 • Published 19 days ago • 30
Generative Universal Verifier as Multimodal Meta-Reasoner Paper • 2510.13804 • Published 19 days ago • 24
Artificial Hippocampus Networks for Efficient Long-Context Modeling Paper • 2510.07318 • Published 26 days ago • 28
Discrete Diffusion in Large Language and Multimodal Models: A Survey Paper • 2506.13759 • Published Jun 16 • 43
VeriThinker: Learning to Verify Makes Reasoning Model Efficient Paper • 2505.17941 • Published May 23 • 25
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding Paper • 2505.16990 • Published May 22 • 22
Long-Context Autoregressive Video Modeling with Next-Frame Prediction Paper • 2503.19325 • Published Mar 25 • 73
Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning Paper • 2503.07906 • Published Mar 10 • 4
ROICtrl: Boosting Instance Control for Visual Generation Paper • 2411.17949 • Published Nov 27, 2024 • 87
OminiControl: Minimal and Universal Control for Diffusion Transformer Paper • 2411.15098 • Published Nov 22, 2024 • 61
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models Paper • 2410.10818 • Published Oct 14, 2024 • 17