Update README.md
Browse files
README.md
CHANGED
|
@@ -10,4 +10,54 @@ pinned: false
|
|
| 10 |
license: unknown
|
| 11 |
---
|
| 12 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
| 10 |
license: unknown
|
| 11 |
---
|
| 12 |
|
| 13 |
+
|
| 14 |
+
# πΌοΈ Image to π§ Audio Story Generator
|
| 15 |
+
|
| 16 |
+
This project showcases an end-to-end pipeline that transforms an image into an audio story using various AI models and tools.
|
| 17 |
+
|
| 18 |
+
## π Overview
|
| 19 |
+
|
| 20 |
+
The goal of this project is to leverage AI capabilities to convert an uploaded image into an audio story. It uses a combination of image captioning, text generation, and text-to-speech models.
|
| 21 |
+
|
| 22 |
+
## π Features
|
| 23 |
+
|
| 24 |
+
### π· Image Captioning
|
| 25 |
+
- Utilizes Salesforce's `blip-image-captioning-base` model to generate textual descriptions of uploaded images.
|
| 26 |
+
|
| 27 |
+
### βοΈ Text Generation (Story Creation)
|
| 28 |
+
- Employs OpenAI's `togethercomputer/llama-2-70b-chat` model to create a short story influenced by the provided image caption within a positive conclusion of 100 words or less.
|
| 29 |
+
|
| 30 |
+
### π Text-to-Speech Conversion
|
| 31 |
+
- Utilizes Hugging Face's `espnet/kan-bayashi_ljspeech_vits` model to convert the generated story into an audio file.
|
| 32 |
+
|
| 33 |
+
### π Streamlit Web App
|
| 34 |
+
- Built using Streamlit, allowing users to upload images and visualize the generated image caption, story, and audio.
|
| 35 |
+
|
| 36 |
+
## π Usage
|
| 37 |
+
|
| 38 |
+
To use this application:
|
| 39 |
+
|
| 40 |
+
1. Clone this repository.
|
| 41 |
+
2. Install the required dependencies using `pip install -r requirements.txt`.
|
| 42 |
+
3. Set up the necessary environment variables:
|
| 43 |
+
- `TOGETHER_API_KEY`: OpenAI API key.
|
| 44 |
+
- `HUGGINGFACEHUB_API_TOKEN`: Hugging Face API token.
|
| 45 |
+
4. Run the Streamlit app with `streamlit run app.py`.
|
| 46 |
+
5. Upload an image file (supported formats: jpg, jpeg, png).
|
| 47 |
+
6. Wait for the AI processing to generate the story and audio.
|
| 48 |
+
7. Access the image caption, story, and audio outputs.
|
| 49 |
+
|
| 50 |
+
## π Code Structure
|
| 51 |
+
|
| 52 |
+
- `app.py`: Contains the Streamlit web application code, integrating all functionalities.
|
| 53 |
+
- `README.md`: Documentation explaining the project, usage instructions, and dependencies.
|
| 54 |
+
- `requirements.txt`: Lists all necessary libraries.
|
| 55 |
+
|
| 56 |
+
## π Credits
|
| 57 |
+
|
| 58 |
+
This project was created with love by @Aditya-Neural-Net-Ninja. It makes use of cutting-edge AI models for image analysis, natural language processing, and text-to-speech conversion. Special thanks to Streamlit and Hugging Face for their incredible platforms.
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
**Note:** Please ensure you have the required API keys and tokens for OpenAI and Hugging Face to run this application successfully.
|
| 62 |
+
|
| 63 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|