Spaces:
Running
Running
metadata
title: AI Avatar Chat
emoji: ๐ญ
colorFrom: purple
colorTo: pink
sdk: docker
pinned: false
license: apache-2.0
suggested_hardware: a10g-small
suggested_storage: large
๐ญ OmniAvatar-14B with HuggingFace TTS
An advanced AI avatar generation system that creates realistic talking avatars from text prompts and speech. This space combines the power of OmniAvatar-14B with HuggingFace SpeechT5 text-to-speech for seamless avatar creation.
โจ Features
- ๐ฏ Text-to-Avatar Generation: Generate avatars from descriptive text prompts
- ๐ฃ๏ธ HuggingFace TTS Integration: High-quality text-to-speech synthesis
- ๐ต Audio URL Support: Use pre-generated audio files
- ๐ผ๏ธ Image Reference Support: Guide avatar appearance with reference images
- โก Real-time Processing: Fast generation with GPU acceleration
- ๐จ Customizable Parameters: Fine-tune generation quality and lip-sync
๐ How to Use
- Enter a Prompt: Describe the character's behavior and appearance
- Choose Audio Source:
- Enter text for automatic speech generation
- OR provide a direct audio URL
- Optional: Add a reference image URL
- Customize: Adjust voice, guidance scale, and generation parameters
- Generate: Create your avatar video!
๐ ๏ธ Parameters
- Guidance Scale (4-6 recommended): Controls how closely the model follows your prompt
- Audio Scale (3-5 recommended): Higher values improve lip-sync accuracy
- Number of Steps (20-50 recommended): More steps = higher quality, longer processing time
๐ Example Prompts
- "A professional teacher explaining a mathematical concept with clear gestures"
- "A friendly presenter speaking confidently to an audience"
- "A news anchor delivering the morning headlines with professional demeanor"
๐ง Technical Details
- Model: OmniAvatar-14B for video generation
- **TTS: Microsoft SpeechT5 (HuggingFace) for high-quality speech synthesis
- Framework: FastAPI + Gradio interface
- GPU: Optimized for T4 and higher
- Storage: Requires large storage due to 14B parameter models (~70GB total)
๐ฎ API Endpoints
GET /health- Check system statusPOST /generate- Generate avatar video/gradio- Interactive web interface
๐ No API Keys Required
This space uses open-source HuggingFace models for text-to-speech. No external API keys or accounts needed!
๐ License
Apache 2.0 - See LICENSE file for details
Powered by OmniAvatar-14B and HuggingFace TTS
Note: This space requires large storage capacity due to the 14B parameter models. The models are downloaded on first startup and cached for subsequent uses.