Spaces:
Running
Running
| ================================================================================ | |
| AI USAGE REPORT (SUMMARY) | |
| Cloudzy AI Challenge - Photo Album Management System | |
| ================================================================================ | |
| PROJECT OVERVIEW | |
| ================ | |
| AI-enhanced photo management system with semantic search, album summarization, | |
| and image generation capabilities. | |
| ================================================================================ | |
| AI MODELS USED | |
| ================================================================================ | |
| 1. IMAGE EMBEDDING: intfloat/multilingual-e5-large | |
| - Location: cloudzy/ai_utils.py (ImageEmbeddingGenerator) | |
| - Purpose: Convert photo metadata into 1024-d vectors for similarity search | |
| - Used in: Photo upload, semantic search, album clustering | |
| 2. SUMMARIZATION: facebook/bart-large-cnn | |
| - Location: cloudzy/ai_utils.py (TextSummarizer) | |
| - Purpose: Generate summaries of photo clusters | |
| - Used in: /albums endpoint (creates album descriptions) | |
| 3. IMAGE ANALYSIS: Google Gemini 2.0-flash | |
| - Location: cloudzy/agents/image_analyzer_2.py (ImageAnalyzerAgent) | |
| - Purpose: Analyze images and generate detailed descriptions | |
| - Used in: /generate-similar-image endpoint (Step 1) | |
| 4. IMAGE GENERATION: black-forest-labs/FLUX.1-dev | |
| - Location: cloudzy/inference_models/text_to_image.py (TextToImageGenerator) | |
| - Purpose: Generate high-quality images from text prompts | |
| - Used in: /generate-similar-image endpoint (Step 3) | |
| ================================================================================ | |
| MANUAL VS AI-GENERATED | |
| ================================================================================ | |
| MANUAL WORK (100% Developer-Written): | |
| β Database schema, API routes, file management | |
| β FastAPI application setup and middleware | |
| β Error handling and validation logic | |
| β File upload service and utilities | |
| β FAISS vector search implementation | |
| HYBRID (Manual Integration + AI Models): | |
| β ImageEmbeddingGenerator: Text β 1024-d embeddings (AI model) | |
| β TextSummarizer: Metadata β album summary (AI model) | |
| β ImageAnalyzerAgent: Image β description (AI model) | |
| β TextToImageGenerator: Prompt β generated image (AI model) | |
| AI-GENERATED CONTENT: | |
| β Embedding vectors (semantic representations) | |
| β Album summaries (cluster descriptions) | |
| β Image descriptions (visual analysis) | |
| β Generated images (from text prompts) | |
| ================================================================================ | |
| NEW ENDPOINT: /generate-similar-image | |
| ================================================================================ | |
| Endpoint: POST /generate-similar-image | |
| Location: cloudzy/routes/generate.py | |
| Workflow: | |
| 1. User uploads an image | |
| 2. ImageAnalyzerAgent analyzes it β gets description | |
| 3. TextToImageGenerator creates new image from description | |
| 4. Returns generated image URL + description | |
| Response: | |
| { | |
| "description": "Detailed image analysis from Gemini", | |
| "generated_image_url": "http://127.0.0.1:8000/uploads/generated_20241025_123456_789.png", | |
| "message": "Similar image generated successfully" | |
| } | |
| Performance: ~40-75 seconds per request | |
| - Image analysis: ~5-10s (Gemini) | |
| - Image generation: ~30-60s (FLUX.1-dev) | |
| ================================================================================ | |
| PROMPTS USED IN PROJECT | |
| ================================================================================ | |
| 1. IMAGE ANALYSIS PROMPT (ImageAnalyzerAgent - Gemini): | |
| Location: cloudzy/agents/image_analyzer_2.py | |
| "Describe this image in a way that could be used as a prompt for generating | |
| a new image inspired by it. Focus on the main subjects, composition, style, | |
| mood, and colors. Avoid mentioning specific names or exact details β instead, | |
| describe the overall aesthetic and atmosphere so the result feels similar but | |
| not identical." | |
| 2. IMAGE GENERATION PROMPT: | |
| - Input: Description from ImageAnalyzerAgent (above) | |
| - Model: FLUX.1-dev (black-forest-labs/FLUX.1-dev) | |
| - Location: cloudzy/inference_models/text_to_image.py | |
| - Strategy: Direct prompt passing to image generation model | |
| ================================================================================ | |
| ENVIRONMENT VARIABLES | |
| ================================================================================ | |
| Required: | |
| - HF_TOKEN: Hugging Face API key (embeddings, summarization) | |
| - GEMINI_API_KEY: Google Gemini API key (image analysis) | |
| - HF_TOKEN_1: Alternative HF token (image generation) | |
| - APP_DOMAIN: App URL (default: http://127.0.0.1:8000/) | |
| ================================================================================ | |
| KEY DECISIONS | |
| ================================================================================ | |
| 1. Used FLUX.1-dev for high-quality image generation (vs Stable Diffusion) | |
| 2. Composable pipeline: ImageAnalyzer β TextToImageGenerator (reusable components) | |
| 3. Graceful error handling with fallbacks when APIs unavailable | |
| 4. Temporary file handling: saves uploads locally for Gemini analysis | |
| ================================================================================ | |
| SUMMARY | |
| ================================================================================ | |
| This project integrates 4 AI models responsibly: | |
| - Embeddings for semantic search | |
| - Summarization for album descriptions | |
| - Vision AI for image analysis | |
| - Generative AI for image creation | |
| All manual work handles infrastructure, logic, validation, and error handling. | |
| AI models are called for their specialized tasks only. | |
| New capability: Generate creative image variations from uploaded photos using | |
| intelligent analysis + high-quality generation pipeline. | |
| ================================================================================ |