An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels Paper • 2406.09415 • Published Jun 13, 2024 • 51
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion Paper • 2406.04338 • Published Jun 6, 2024 • 39
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 108
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published Jan 1 • 107
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper • 2501.03895 • Published Jan 7 • 52
MatAnyone: Stable Video Matting with Consistent Memory Propagation Paper • 2501.14677 • Published Jan 24 • 34
ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features Paper • 2502.04320 • Published Feb 6 • 37
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling Paper • 2502.09509 • Published Feb 13 • 8
Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator Paper • 2502.19204 • Published Feb 26 • 11
UniTok: A Unified Tokenizer for Visual Generation and Understanding Paper • 2502.20321 • Published Feb 27 • 31
How far can we go with ImageNet for Text-to-Image generation? Paper • 2502.21318 • Published Feb 28 • 26
AI-Invented Tonal Languages: Preventing a Machine Lingua Franca Beyond Human Understanding Paper • 2503.01063 • Published Mar 2 • 5
Fine-Tuning Small Language Models for Domain-Specific AI: An Edge AI Perspective Paper • 2503.01933 • Published Mar 3 • 13
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM Paper • 2503.04724 • Published Mar 6 • 72
Forgetting Transformer: Softmax Attention with a Forget Gate Paper • 2503.02130 • Published Mar 3 • 32
AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion Models Paper • 2503.08417 • Published Mar 11 • 8
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published Mar 12 • 73
The Curse of Conditions: Analyzing and Improving Optimal Transport for Conditional Flow-Based Generation Paper • 2503.10636 • Published Mar 13 • 3
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video Paper • 2503.11647 • Published Mar 14 • 145
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation Paper • 2503.16660 • Published Mar 20 • 72
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models Paper • 2503.20240 • Published Mar 26 • 22
SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling Paper • 2503.21732 • Published Mar 27 • 9
X^{2}-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction Paper • 2503.21779 • Published Mar 27 • 4
AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary Adaptation Paper • 2503.19693 • Published Mar 25 • 76
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published Apr 18 • 135
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax Paper • 2504.20966 • Published Apr 29 • 33
Training-Free Efficient Video Generation via Dynamic Token Carving Paper • 2505.16864 • Published May 22 • 24
Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks Paper • 2505.11881 • Published May 17 • 4
Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model Paper • 2506.15682 • Published Jun 18 • 5
HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling Paper • 2506.20452 • Published Jun 25 • 19
Peccavi: Visual Paraphrase Attack Safe and Distortion Free Image Watermarking Technique for AI-Generated Images Paper • 2506.22960 • Published Jun 28 • 6
Doodle Your Keypoints: Sketch-Based Few-Shot Keypoint Detection Paper • 2507.07994 • Published Jul 10 • 2
FLEXITOKENS: Flexible Tokenization for Evolving Language Models Paper • 2507.12720 • Published Jul 17 • 9
2D Gaussian Splatting with Semantic Alignment for Image Inpainting Paper • 2509.01964 • Published Sep 2 • 6
ReSWD: ReSTIR'd, not shaken. Combining Reservoir Sampling and Sliced Wasserstein Distance for Variance Reduction Paper • 2510.01061 • Published 27 days ago • 2
SVGFusion: Scalable Text-to-SVG Generation via Vector Space Diffusion Paper • 2412.10437 • Published Dec 11, 2024 • 6
Image-GS: Content-Adaptive Image Representation via 2D Gaussians Paper • 2407.01866 • Published Jul 2, 2024 • 1
Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D Assets Paper • 2510.19944 • Published 6 days ago • 15
Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning Paper • 2505.20161 • Published May 26 • 1