Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach Paper β’ 2512.02834 β’ Published 29 days ago β’ 39
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation Paper β’ 2507.08441 β’ Published Jul 11 β’ 61
Runtime error Featured 515 Florence2 + SAM2 π₯ 515 Segment and caption objects in images and videos
Physical AI Collection Collection of open, commercial-grade datasets for physical AI developers β’ 23 items β’ Updated 7 days ago β’ 103
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models Paper β’ 2502.06608 β’ Published Feb 10 β’ 39
openai/whisper-medium Automatic Speech Recognition β’ 0.8B β’ Updated Feb 29, 2024 β’ 715k β’ 271