AI_Avatar_Chat / README.md
bravedims
Deploy OmniAvatar-14B with ElevenLabs TTS integration to Hugging Face Spaces
bd1f2b1
|
raw
history blame
2.44 kB
metadata
title: AI Avatar Chat
emoji: ๐ŸŽญ
colorFrom: purple
colorTo: pink
sdk: docker
pinned: false
license: apache-2.0
suggested_hardware: t4-medium
suggested_storage: medium

๐ŸŽญ OmniAvatar-14B with ElevenLabs TTS

An advanced AI avatar generation system that creates realistic talking avatars from text prompts and speech. This space combines the power of OmniAvatar-14B with ElevenLabs text-to-speech for seamless avatar creation.

โœจ Features

  • ๐ŸŽฏ Text-to-Avatar Generation: Generate avatars from descriptive text prompts
  • ๐Ÿ—ฃ๏ธ ElevenLabs Integration: High-quality text-to-speech synthesis
  • ๐ŸŽต Audio URL Support: Use pre-generated audio files
  • ๐Ÿ–ผ๏ธ Image Reference Support: Guide avatar appearance with reference images
  • โšก Real-time Processing: Fast generation with GPU acceleration
  • ๐ŸŽจ Customizable Parameters: Fine-tune generation quality and lip-sync

๐Ÿš€ How to Use

  1. Enter a Prompt: Describe the character's behavior and appearance
  2. Choose Audio Source:
    • Enter text for automatic speech generation
    • OR provide a direct audio URL
  3. Optional: Add a reference image URL
  4. Customize: Adjust voice, guidance scale, and generation parameters
  5. Generate: Create your avatar video!

๐Ÿ› ๏ธ Parameters

  • Guidance Scale (4-6 recommended): Controls how closely the model follows your prompt
  • Audio Scale (3-5 recommended): Higher values improve lip-sync accuracy
  • Number of Steps (20-50 recommended): More steps = higher quality, longer processing time

๐Ÿ“ Example Prompts

  • "A professional teacher explaining a mathematical concept with clear gestures"
  • "A friendly presenter speaking confidently to an audience"
  • "A news anchor delivering the morning headlines with professional demeanor"

๐Ÿ”ง Technical Details

  • Model: OmniAvatar-14B for video generation
  • TTS: ElevenLabs API for high-quality speech synthesis
  • Framework: FastAPI + Gradio interface
  • GPU: Optimized for T4 and higher

๐ŸŽฎ API Endpoints

  • GET /health - Check system status
  • POST /generate - Generate avatar video
  • /gradio - Interactive web interface

๐Ÿ” Environment Variables

The space uses ElevenLabs for text-to-speech. For optimal performance, configure your ElevenLabs API key as a secret.

๐Ÿ“„ License

Apache 2.0 - See LICENSE file for details


Powered by OmniAvatar-14B and ElevenLabs TTS