Latent Diffusion Model without Variational Autoencoder Paper • 2510.15301 • Published 16 days ago • 48
RIR-Mega: a large-scale simulated room impulse response dataset for machine learning and room acoustics modeling Paper • 2510.18917 • Published 12 days ago • 4
The Unanticipated Asymmetry Between Perceptual Optimization and Assessment Paper • 2509.20878 • Published Sep 25 • 3
DC-AE 1.5: Accelerating Diffusion Model Convergence with Structured Latent Space Paper • 2508.00413 • Published Aug 1 • 5
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model Paper • 2408.17175 • Published Aug 30, 2024 • 6
Dedelayed: Deleting remote inference delay via on-device correction Paper • 2510.13714 • Published 17 days ago • 1
Stable Video Infinity: Infinite-Length Video Generation with Error Recycling Paper • 2510.09212 • Published 23 days ago • 14
Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction Paper • 2510.04759 • Published 27 days ago • 9
On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in Large Vision-Language Models Paper • 2510.09008 • Published 23 days ago • 15
UniFusion: Vision-Language Model as Unified Encoder in Image Generation Paper • 2510.12789 • Published 18 days ago • 16
FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution Paper • 2510.12747 • Published 18 days ago • 36