PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation Paper • 2512.04025 • Published 23 days ago • 2
Seeing Across Views: Benchmarking Spatial Reasoning of Vision-Language Models in Robotic Scenes Paper • 2510.19400 • Published Oct 22
DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion Paper • 2510.15264 • Published Oct 17 • 1
TransDiff: Diffusion-Based Method for Manipulating Transparent Objects Using a Single RGB-D Image Paper • 2503.12779 • Published Mar 17
VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction Paper • 2509.19297 • Published Sep 23 • 24
SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models Paper • 2406.09098 • Published Jun 13, 2024 • 1
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning Paper • 2506.04207 • Published Jun 4 • 48
Revisiting Depth Representations for Feed-Forward 3D Gaussian Splatting Paper • 2506.05327 • Published Jun 5 • 11
ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS Paper • 2505.23734 • Published May 29 • 4