papers
updated
Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Paper
•
2504.16064
•
Published
•
14
LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision
Foundation Models
Paper
•
2504.14032
•
Published
•
7
Towards Understanding Camera Motions in Any Video
Paper
•
2504.15376
•
Published
•
155
Paper2Code: Automating Code Generation from Scientific Papers in Machine
Learning
Paper
•
2504.17192
•
Published
•
120
3D Scene Generation: A Survey
Paper
•
2505.05474
•
Published
•
21
DDT: Decoupled Diffusion Transformer
Paper
•
2504.05741
•
Published
•
77
MonoPlace3D: Learning 3D-Aware Object Placement for 3D Monocular
Detection
Paper
•
2504.06801
•
Published
•
4
Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction
Paper
•
2504.07961
•
Published
•
5
Tokenize Image Patches: Global Context Fusion for Effective Haze Removal
in Large Images
Paper
•
2504.09621
•
Published
•
11
HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation
Paper
•
2504.13072
•
Published
•
13
DMM: Building a Versatile Image Generation Model via Distillation-Based
Model Merging
Paper
•
2504.12364
•
Published
•
22
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models
Paper
•
2504.05303
•
Published
•
5
FlexIP: Dynamic Control of Preservation and Personality for Customized
Image Generation
Paper
•
2504.07405
•
Published
•
11
Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections
of Images
Paper
•
2504.08727
•
Published
•
12
MIEB: Massive Image Embedding Benchmark
Paper
•
2504.10471
•
Published
•
20
BlockGaussian: Efficient Large-Scale Scene Novel View Synthesis via
Adaptive Block-Based Gaussian Splatting
Paper
•
2504.09048
•
Published
•
7
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion
Transformers
Paper
•
2504.10483
•
Published
•
21
PerceptionLM: Open-Access Data and Models for Detailed Visual
Understanding
Paper
•
2504.13180
•
Published
•
19
Visual Planning: Let's Think Only with Images
Paper
•
2505.11409
•
Published
•
57
Constructing a 3D Town from a Single Image
Paper
•
2505.15765
•
Published
•
24
SSR: Enhancing Depth Perception in Vision-Language Models via
Rationale-Guided Spatial Reasoning
Paper
•
2505.12448
•
Published
•
10
Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence
with Egocentric-Exocentric Vision
Paper
•
2506.06253
•
Published
•
9
Image Reconstruction as a Tool for Feature Analysis
Paper
•
2506.07803
•
Published
•
29
Vision Transformers Don't Need Trained Registers
Paper
•
2506.08010
•
Published
•
22