AI_Avatar_Chat / README.md
bravedims
Replace ElevenLabs with HuggingFace TTS (SpeechT5)
8be8b4b
|
raw
history blame
2.71 kB
metadata
title: AI Avatar Chat
emoji: ๐ŸŽญ
colorFrom: purple
colorTo: pink
sdk: docker
pinned: false
license: apache-2.0
suggested_hardware: a10g-small
suggested_storage: large

๐ŸŽญ OmniAvatar-14B with HuggingFace TTS

An advanced AI avatar generation system that creates realistic talking avatars from text prompts and speech. This space combines the power of OmniAvatar-14B with HuggingFace SpeechT5 text-to-speech for seamless avatar creation.

โœจ Features

  • ๐ŸŽฏ Text-to-Avatar Generation: Generate avatars from descriptive text prompts
  • ๐Ÿ—ฃ๏ธ HuggingFace TTS Integration: High-quality text-to-speech synthesis
  • ๐ŸŽต Audio URL Support: Use pre-generated audio files
  • ๐Ÿ–ผ๏ธ Image Reference Support: Guide avatar appearance with reference images
  • โšก Real-time Processing: Fast generation with GPU acceleration
  • ๐ŸŽจ Customizable Parameters: Fine-tune generation quality and lip-sync

๐Ÿš€ How to Use

  1. Enter a Prompt: Describe the character's behavior and appearance
  2. Choose Audio Source:
    • Enter text for automatic speech generation
    • OR provide a direct audio URL
  3. Optional: Add a reference image URL
  4. Customize: Adjust voice, guidance scale, and generation parameters
  5. Generate: Create your avatar video!

๐Ÿ› ๏ธ Parameters

  • Guidance Scale (4-6 recommended): Controls how closely the model follows your prompt
  • Audio Scale (3-5 recommended): Higher values improve lip-sync accuracy
  • Number of Steps (20-50 recommended): More steps = higher quality, longer processing time

๐Ÿ“ Example Prompts

  • "A professional teacher explaining a mathematical concept with clear gestures"
  • "A friendly presenter speaking confidently to an audience"
  • "A news anchor delivering the morning headlines with professional demeanor"

๐Ÿ”ง Technical Details

  • Model: OmniAvatar-14B for video generation
  • **TTS: Microsoft SpeechT5 (HuggingFace) for high-quality speech synthesis
  • Framework: FastAPI + Gradio interface
  • GPU: Optimized for T4 and higher
  • Storage: Requires large storage due to 14B parameter models (~70GB total)

๐ŸŽฎ API Endpoints

  • GET /health - Check system status
  • POST /generate - Generate avatar video
  • /gradio - Interactive web interface

๐Ÿ” No API Keys Required

This space uses open-source HuggingFace models for text-to-speech. No external API keys or accounts needed!

๐Ÿ“„ License

Apache 2.0 - See LICENSE file for details


Powered by OmniAvatar-14B and HuggingFace TTS

Note: This space requires large storage capacity due to the 14B parameter models. The models are downloaded on first startup and cached for subsequent uses.