---
title: AI Avatar Chat
emoji: 🎭
colorFrom: purple
colorTo: pink
sdk: docker
pinned: false
license: apache-2.0
suggested_hardware: a10g-small
suggested_storage: large
---

# 🎭 OmniAvatar-14B with HuggingFace TTS

An advanced AI avatar generation system that creates realistic talking avatars from text prompts and speech. This space combines the power of OmniAvatar-14B with HuggingFace SpeechT5 text-to-speech for seamless avatar creation.

## ✨ Features

- **🎯 Text-to-Avatar Generation**: Generate avatars from descriptive text prompts
- **🗣️ HuggingFace TTS Integration**: High-quality text-to-speech synthesis
- **🎵 Audio URL Support**: Use pre-generated audio files
- **🖼️ Image Reference Support**: Guide avatar appearance with reference images
- **⚡ Real-time Processing**: Fast generation with GPU acceleration
- **🎨 Customizable Parameters**: Fine-tune generation quality and lip-sync

## 🚀 How to Use

1. **Enter a Prompt**: Describe the character's behavior and appearance
2. **Choose Audio Source**: 
   - Enter text for automatic speech generation
   - OR provide a direct audio URL
3. **Optional**: Add a reference image URL
4. **Customize**: Adjust voice, guidance scale, and generation parameters
5. **Generate**: Create your avatar video!

## 🛠️ Parameters

- **Guidance Scale** (4-6 recommended): Controls how closely the model follows your prompt
- **Audio Scale** (3-5 recommended): Higher values improve lip-sync accuracy
- **Number of Steps** (20-50 recommended): More steps = higher quality, longer processing time

## 📝 Example Prompts

- "A professional teacher explaining a mathematical concept with clear gestures"
- "A friendly presenter speaking confidently to an audience"
- "A news anchor delivering the morning headlines with professional demeanor"

## 🔧 Technical Details

- **Model**: OmniAvatar-14B for video generation
- ****TTS**: Microsoft SpeechT5 (HuggingFace) for high-quality speech synthesis
- **Framework**: FastAPI + Gradio interface
- **GPU**: Optimized for T4 and higher
- **Storage**: Requires large storage due to 14B parameter models (~70GB total)

## 🎮 API Endpoints

- `GET /health` - Check system status
- `POST /generate` - Generate avatar video
- `/gradio` - Interactive web interface

## 🔐 No API Keys Required

This space uses open-source HuggingFace models for text-to-speech. No external API keys or accounts needed!

## 📄 License

Apache 2.0 - See LICENSE file for details

---

*Powered by OmniAvatar-14B and HuggingFace TTS*

**Note**: This space requires large storage capacity due to the 14B parameter models. The models are downloaded on first startup and cached for subsequent uses.