view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face Jul 29 • 190
SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction Paper • 2507.15852 • Published Jul 21 • 38
GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset Paper • 2507.21033 • Published Jul 28 • 20
Encoders vs Decoders: the Ettin Suite Collection A collection of SOTA, open-data, paired encoder-only and decoder only models ranging from 17M params to 1B. See the paper at https://arxiv.org/abs/250 • 32 items • Updated Jul 16 • 24
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published Dec 18, 2024 • 156
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25, 2024 • 121
StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images Paper • 2406.13735 • Published Jun 19, 2024 • 6
The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing Paper • 2406.10601 • Published Jun 15, 2024 • 70
Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm Paper • 2403.11781 • Published Mar 18, 2024 • 19
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation Paper • 2403.12015 • Published Mar 18, 2024 • 70
Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions Paper • 2401.01827 • Published Jan 3, 2024 • 18
MobileVLM : A Fast, Reproducible and Strong Vision Language Assistant for Mobile Devices Paper • 2312.16886 • Published Dec 28, 2023 • 22
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing Paper • 2312.11392 • Published Dec 18, 2023 • 20
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models Paper • 2312.00845 • Published Dec 1, 2023 • 39
CopyRNeRF: Protecting the CopyRight of Neural Radiance Fields Paper • 2307.11526 • Published Jul 21, 2023 • 12
Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language Paper • 2306.16410 • Published Jun 28, 2023 • 28
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation Paper • 2306.07954 • Published Jun 13, 2023 • 111