Spaces:

utarn
/

ai_ocr

Running

App Files Files Community

utarn commited on Sep 23

Commit

9b7ef55

1 Parent(s): 1a8c8de

Add readme and model card

Browse files

Files changed (4) hide show

README.md +24 -1
model_card.md +110 -0
requirements.txt +5 -0
spaces_config.json +6 -0

README.md CHANGED Viewed

@@ -1,3 +1,14 @@
 # Omni API Gradio UI
 A Gradio-based user interface for the Omni API that supports text, PDF, image, and audio file processing.
@@ -42,4 +53,16 @@ This will monitor for changes in Python files, Markdown files, and TOML configur
 - **PDFs**: Document processing
 - **Images**: JPG, PNG, GIF, BMP, WEBP
-- **Audio**: MP3, WAV, M4A, FLAC, OGG

+---
+title: Omni API Gradio UI
+emoji: 🤖
+colorFrom: blue
+colorTo: purple
+sdk: gradio
+sdk_version: "4.0.0"
+app_file: app.py
+pinned: false
+---
 # Omni API Gradio UI
 A Gradio-based user interface for the Omni API that supports text, PDF, image, and audio file processing.
 - **PDFs**: Document processing
 - **Images**: JPG, PNG, GIF, BMP, WEBP
+- **Audio**: MP3, WAV, M4A, FLAC, OGG
+---
+tags:
+- gradio
+- omni-api
+- multimodal
+- chat-interface
+- pdf-processing
+- image-processing
+- audio-processing
+- llm
+- api-client

model_card.md ADDED Viewed

	@@ -0,0 +1,110 @@

+---
+license: mit
+tags:
+- gradio
+- omni-api
+- multimodal
+- chat-interface
+- pdf-processing
+- image-processing
+- audio-processing
+- llm
+- api-client
+- chatbot
+- text-generation
+- document-analysis
+- ocr
+- transcription
+widget:
+- src: https://api.modelharbor.com
+---
+# Omni API Gradio UI
+This is a Gradio-based user interface for the Omni API that supports multimodal interactions with various file types including text, PDF documents, images, and audio files.
+## Model Description
+The Omni API Gradio UI provides an easy-to-use web interface for interacting with the Omni API, which supports advanced multimodal AI capabilities. Users can send text prompts along with various file types and receive intelligent responses.
+### Supported Models
+The interface supports several state-of-the-art models:
+- typhoon-ocr-preview
+- openai/gpt-5
+- meta-llama/llama-4-maverick
+- qwen/qwen3-235b-a22b-instruct-2507
+- gemini/gemini-2.5-pro
+- gemini/gemini-2.5-flash
+## Features
+- **Multimodal Support**: Process text, PDFs, images, and audio files in a single interface
+- **File Ordering**: Upload multiple files in a specific order for precise control
+- **Configurable Models**: Switch between different AI models for different tasks
+- **Real-time Responses**: Get immediate feedback from the API
+- **Customizable Parameters**: Adjust max tokens and other settings
+## Intended Uses & Limitations
+### Intended Uses
+- Document analysis and summarization
+- Image OCR and analysis
+- Audio transcription and analysis
+- Multimodal chat applications
+- Content extraction from various file formats
+### Limitations
+- Requires access to the Omni API
+- Dependent on network connectivity
+- File size limitations based on API constraints
+- Some models may require API keys
+## How to Use
+1. Configure the API base URL (defaults to https://api.modelharbor.com)
+2. Select your preferred model from the dropdown
+3. Enter your text message in the input box
+4. Upload files (PDF, images, or audio) as needed
+5. Click "Send Request" to interact with the API
+6. View the response in the output panel
+### Supported File Types
+- **PDFs**: Document processing and analysis
+- **Images**: JPG, PNG, GIF, BMP, WEBP for OCR and visual analysis
+- **Audio**: MP3, WAV, M4A, FLAC, OGG for transcription
+## Technical Details
+### Frameworks and Libraries
+- Gradio 4.0+
+- Python 3.8+
+- Requests library for API communication
+### Installation
+```bash
+# Install dependencies
+uv sync
+# Run the application
+uv run python app.py
+```
+### Development Mode
+```bash
+# Run with auto-reload for development
+uv run python dev.py
+```
+## Citation
+If you use this interface in your work, please cite:
+```
+@misc{omni_api_gradio_ui,
+  title={Omni API Gradio UI},
+  author={ModelHarbor Team},
+  year={2025},
+  howpublished={\url{https://github.com/your-username/omni-api-gradio-ui}}
+}

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+gradio>=4.0.0
+requests>=2.31.0
+python-multipart>=0.0.6
+aiofiles>=23.0.0
+gradio-client>=1.3.0

spaces_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "app_file": "app.py",
+  "requirements": "requirements.txt",
+  "sdk": "gradio",
+  "sdk_version": "4.0.0"
+}