Spaces:

bravedims
/

AI_Avatar_Chat

Running

App Files Files Community

AI_Avatar_Chat / README.md

bravedims

Replace ElevenLabs with HuggingFace TTS (SpeechT5)

8be8b4b 3 months ago

preview code

raw

history blame

2.71 kB

metadata

title: AI Avatar Chat
emoji: 🎭
colorFrom: purple
colorTo: pink
sdk: docker
pinned: false
license: apache-2.0
suggested_hardware: a10g-small
suggested_storage: large

🎭 OmniAvatar-14B with HuggingFace TTS

An advanced AI avatar generation system that creates realistic talking avatars from text prompts and speech. This space combines the power of OmniAvatar-14B with HuggingFace SpeechT5 text-to-speech for seamless avatar creation.

✨ Features

🎯 Text-to-Avatar Generation: Generate avatars from descriptive text prompts
🗣️ HuggingFace TTS Integration: High-quality text-to-speech synthesis
🎵 Audio URL Support: Use pre-generated audio files
🖼️ Image Reference Support: Guide avatar appearance with reference images
⚡ Real-time Processing: Fast generation with GPU acceleration
🎨 Customizable Parameters: Fine-tune generation quality and lip-sync

🚀 How to Use

Enter a Prompt: Describe the character's behavior and appearance
Choose Audio Source:
- Enter text for automatic speech generation
- OR provide a direct audio URL
Optional: Add a reference image URL
Customize: Adjust voice, guidance scale, and generation parameters
Generate: Create your avatar video!

🛠️ Parameters

Guidance Scale (4-6 recommended): Controls how closely the model follows your prompt
Audio Scale (3-5 recommended): Higher values improve lip-sync accuracy
Number of Steps (20-50 recommended): More steps = higher quality, longer processing time

📝 Example Prompts

"A professional teacher explaining a mathematical concept with clear gestures"
"A friendly presenter speaking confidently to an audience"
"A news anchor delivering the morning headlines with professional demeanor"

🔧 Technical Details

Model: OmniAvatar-14B for video generation
**TTS: Microsoft SpeechT5 (HuggingFace) for high-quality speech synthesis
Framework: FastAPI + Gradio interface
GPU: Optimized for T4 and higher
Storage: Requires large storage due to 14B parameter models (~70GB total)

🎮 API Endpoints

GET /health - Check system status
POST /generate - Generate avatar video
/gradio - Interactive web interface

🔐 No API Keys Required

This space uses open-source HuggingFace models for text-to-speech. No external API keys or accounts needed!

📄 License

Apache 2.0 - See LICENSE file for details

Powered by OmniAvatar-14B and HuggingFace TTS

Note: This space requires large storage capacity due to the 14B parameter models. The models are downloaded on first startup and cached for subsequent uses.