--- title: AI Avatar Chat emoji: 🎭 colorFrom: purple colorTo: pink sdk: docker pinned: false license: apache-2.0 suggested_hardware: a10g-small suggested_storage: large --- # 🎭 OmniAvatar-14B with HuggingFace TTS An advanced AI avatar generation system that creates realistic talking avatars from text prompts and speech. This space combines the power of OmniAvatar-14B with HuggingFace SpeechT5 text-to-speech for seamless avatar creation. ## ✨ Features - **🎯 Text-to-Avatar Generation**: Generate avatars from descriptive text prompts - **🗣️ HuggingFace TTS Integration**: High-quality text-to-speech synthesis - **🎵 Audio URL Support**: Use pre-generated audio files - **🖼️ Image Reference Support**: Guide avatar appearance with reference images - **⚡ Real-time Processing**: Fast generation with GPU acceleration - **🎨 Customizable Parameters**: Fine-tune generation quality and lip-sync ## 🚀 How to Use 1. **Enter a Prompt**: Describe the character's behavior and appearance 2. **Choose Audio Source**: - Enter text for automatic speech generation - OR provide a direct audio URL 3. **Optional**: Add a reference image URL 4. **Customize**: Adjust voice, guidance scale, and generation parameters 5. **Generate**: Create your avatar video! ## 🛠️ Parameters - **Guidance Scale** (4-6 recommended): Controls how closely the model follows your prompt - **Audio Scale** (3-5 recommended): Higher values improve lip-sync accuracy - **Number of Steps** (20-50 recommended): More steps = higher quality, longer processing time ## 📝 Example Prompts - "A professional teacher explaining a mathematical concept with clear gestures" - "A friendly presenter speaking confidently to an audience" - "A news anchor delivering the morning headlines with professional demeanor" ## 🔧 Technical Details - **Model**: OmniAvatar-14B for video generation - ****TTS**: Microsoft SpeechT5 (HuggingFace) for high-quality speech synthesis - **Framework**: FastAPI + Gradio interface - **GPU**: Optimized for T4 and higher - **Storage**: Requires large storage due to 14B parameter models (~70GB total) ## 🎮 API Endpoints - `GET /health` - Check system status - `POST /generate` - Generate avatar video - `/gradio` - Interactive web interface ## 🔐 No API Keys Required This space uses open-source HuggingFace models for text-to-speech. No external API keys or accounts needed! ## 📄 License Apache 2.0 - See LICENSE file for details --- *Powered by OmniAvatar-14B and HuggingFace TTS* **Note**: This space requires large storage capacity due to the 14B parameter models. The models are downloaded on first startup and cached for subsequent uses.