Spaces:
Running
Running
metadata
title: AI Avatar Chat
emoji: ๐ญ
colorFrom: purple
colorTo: pink
sdk: docker
pinned: false
license: apache-2.0
suggested_hardware: t4-medium
suggested_storage: medium
๐ญ OmniAvatar-14B with ElevenLabs TTS
An advanced AI avatar generation system that creates realistic talking avatars from text prompts and speech. This space combines the power of OmniAvatar-14B with ElevenLabs text-to-speech for seamless avatar creation.
โจ Features
- ๐ฏ Text-to-Avatar Generation: Generate avatars from descriptive text prompts
- ๐ฃ๏ธ ElevenLabs Integration: High-quality text-to-speech synthesis
- ๐ต Audio URL Support: Use pre-generated audio files
- ๐ผ๏ธ Image Reference Support: Guide avatar appearance with reference images
- โก Real-time Processing: Fast generation with GPU acceleration
- ๐จ Customizable Parameters: Fine-tune generation quality and lip-sync
๐ How to Use
- Enter a Prompt: Describe the character's behavior and appearance
- Choose Audio Source:
- Enter text for automatic speech generation
- OR provide a direct audio URL
- Optional: Add a reference image URL
- Customize: Adjust voice, guidance scale, and generation parameters
- Generate: Create your avatar video!
๐ ๏ธ Parameters
- Guidance Scale (4-6 recommended): Controls how closely the model follows your prompt
- Audio Scale (3-5 recommended): Higher values improve lip-sync accuracy
- Number of Steps (20-50 recommended): More steps = higher quality, longer processing time
๐ Example Prompts
- "A professional teacher explaining a mathematical concept with clear gestures"
- "A friendly presenter speaking confidently to an audience"
- "A news anchor delivering the morning headlines with professional demeanor"
๐ง Technical Details
- Model: OmniAvatar-14B for video generation
- TTS: ElevenLabs API for high-quality speech synthesis
- Framework: FastAPI + Gradio interface
- GPU: Optimized for T4 and higher
๐ฎ API Endpoints
GET /health- Check system statusPOST /generate- Generate avatar video/gradio- Interactive web interface
๐ Environment Variables
The space uses ElevenLabs for text-to-speech. For optimal performance, configure your ElevenLabs API key as a secret.
๐ License
Apache 2.0 - See LICENSE file for details
Powered by OmniAvatar-14B and ElevenLabs TTS