1 9 3

Tong He

tonghe90

http://tonghe90.github.io

AI & ML interests

SII is an institution dedicated to innovation in education and research in the field of AI

Recent Activity

upvoted a paper 29 days ago

BRIDGE - Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation

upvoted a paper about 1 month ago

Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation

authored a paper about 1 month ago

ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network

View all activity

Organizations

authored 5 papers about 1 month ago

authored 15 papers 3 months ago

Aether: Geometric-Aware Unified World Modeling

Paper • 2503.18945 • Published Mar 24 • 28

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning

Paper • 2507.13347 • Published Jul 17 • 64

GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving

Paper • 2503.05689 • Published Mar 7 • 3

DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks

Paper • 2502.17157 • Published Feb 24 • 52

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 159

Depth Any Video with Scalable Synthetic Data

Paper • 2410.10815 • Published Oct 14, 2024 • 2

SAM3D: Segment Anything in 3D Scenes

Paper • 2306.03908 • Published Jun 6, 2023 • 1

Point Transformer V3: Simpler, Faster, Stronger

Paper • 2312.10035 • Published Dec 15, 2023 • 21

GVGEN: Text-to-3D Generation with Volumetric Representation

Paper • 2403.12957 • Published Mar 19, 2024 • 6

Ponder: Point Cloud Pre-training via Neural Rendering

Paper • 2301.00157 • Published Dec 31, 2022

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

Paper • 2406.10163 • Published Jun 14, 2024 • 33

SPA: 3D Spatial-Awareness Enables Effective Embodied Representation

Paper • 2410.08208 • Published Oct 10, 2024

PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines

Paper • 2407.08418 • Published Jul 11, 2024

Sekai: A Video Dataset towards World Exploration

Paper • 2506.15675 • Published Jun 18 • 64

VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers

Paper • 2507.01016 • Published Jul 1 • 1

Tong He

AI & ML interests

Recent Activity

Organizations

tonghe90's activity