Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
emjay73 's Collections
2D generation
3D Edit
Feature
3D generation
video generation
architecture
3D Recon
multimodal
2D Recognition
2D Perception
Data
VideoEdit
4D generation
Tracking
4D Perception
Optimization
3D Animatable Face
Audio generation
LLM service
4D Recon

multimodal

updated Jun 21
Upvote
-

  • Gemini: A Family of Highly Capable Multimodal Models

    Paper • 2312.11805 • Published Dec 19, 2023 • 47

  • VCoder: Versatile Vision Encoders for Multimodal Large Language Models

    Paper • 2312.14233 • Published Dec 21, 2023 • 17

  • Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities

    Paper • 2405.18669 • Published May 29, 2024 • 12

  • Ming-Omni: A Unified Multimodal Model for Perception and Generation

    Paper • 2506.09344 • Published Jun 11 • 28
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs