Spaces:

bravedims
/

AI_Avatar_Chat

Running

App Files Files Community

AI_Avatar_Chat / README.md

bravedims

Deploy OmniAvatar-14B with ElevenLabs TTS integration to Hugging Face Spaces

bd1f2b1 3 months ago

preview code

raw

history blame

2.44 kB

metadata

title: AI Avatar Chat
emoji: 🎭
colorFrom: purple
colorTo: pink
sdk: docker
pinned: false
license: apache-2.0
suggested_hardware: t4-medium
suggested_storage: medium

🎭 OmniAvatar-14B with ElevenLabs TTS

An advanced AI avatar generation system that creates realistic talking avatars from text prompts and speech. This space combines the power of OmniAvatar-14B with ElevenLabs text-to-speech for seamless avatar creation.

✨ Features

🎯 Text-to-Avatar Generation: Generate avatars from descriptive text prompts
🗣️ ElevenLabs Integration: High-quality text-to-speech synthesis
🎵 Audio URL Support: Use pre-generated audio files
🖼️ Image Reference Support: Guide avatar appearance with reference images
⚡ Real-time Processing: Fast generation with GPU acceleration
🎨 Customizable Parameters: Fine-tune generation quality and lip-sync

🚀 How to Use

Enter a Prompt: Describe the character's behavior and appearance
Choose Audio Source:
- Enter text for automatic speech generation
- OR provide a direct audio URL
Optional: Add a reference image URL
Customize: Adjust voice, guidance scale, and generation parameters
Generate: Create your avatar video!

🛠️ Parameters

Guidance Scale (4-6 recommended): Controls how closely the model follows your prompt
Audio Scale (3-5 recommended): Higher values improve lip-sync accuracy
Number of Steps (20-50 recommended): More steps = higher quality, longer processing time

📝 Example Prompts

"A professional teacher explaining a mathematical concept with clear gestures"
"A friendly presenter speaking confidently to an audience"
"A news anchor delivering the morning headlines with professional demeanor"

🔧 Technical Details

Model: OmniAvatar-14B for video generation
TTS: ElevenLabs API for high-quality speech synthesis
Framework: FastAPI + Gradio interface
GPU: Optimized for T4 and higher

🎮 API Endpoints

GET /health - Check system status
POST /generate - Generate avatar video
/gradio - Interactive web interface

🔐 Environment Variables

The space uses ElevenLabs for text-to-speech. For optimal performance, configure your ElevenLabs API key as a secret.

📄 License

Apache 2.0 - See LICENSE file for details