AI USAGE REPORT - Cloudzy AI Challenge ======================================== PROJECT OVERVIEW: FastAPI-based photo management system with semantic search, AI image analysis, and text-to-image generation. WHERE & HOW AI WAS USED: 1. Image Analysis - Structured Metadata (cloudzy/agents/image_analyzer.py) - Tool: Qwen/Qwen3-VL-8B-Instruct model via HuggingFace API - Function: Auto-generate tags, descriptions, and captions for uploaded photos 1b. Image Analysis - Aesthetic Descriptions (cloudzy/agents/image_analyzer_2.py) - Tool: Gemini-2.0-Flash model via smolagents + OpenAI-compatible API - Function: Generate aesthetic image descriptions for inspiration-based generation 2. Text-to-Image Generation (cloudzy/inference_models/text_to_image.py) - Tool: FLUX.1-dev model via HuggingFace Inference API - Function: Generate images from text prompts 3. Semantic Search (cloudzy/search_engine.py + cloudzy/routes/search.py) - Tool: FAISS (vector database) with embeddings - Function: Find visually similar photos via embedding vectors PROMPTS & MODEL INPUTS: Image Analysis Prompt #1 - Structured Metadata (image_analyzer.py): "Describe this image in the following exact format: result: {tags: [...], description: '...', caption: '...'}" - Input: Image URL sent to vision model - Model ingests structured format request to ensure JSON output Image Analysis Prompt #2 - Generative Inspiration (image_analyzer_2.py, Gemini via smolagents): "Describe this image in a way that could be used as a prompt for generating a new image inspired by it. Focus on the main subjects, composition, style, mood, and colors. Avoid mentioning specific names or exact details — instead, describe the overall aesthetic and atmosphere so the result feels similar but not identical." - Input: Local image file sent to Gemini-2.0-Flash model - Designed for generating aesthetic descriptions usable as prompts for image generation Search Queries: - User text → converted to embeddings → matched against photo database - Album creation: Groups similar photos by distance threshold (randomized each call) MODEL OUTPUTS REFINED: ✓ JSON parsing: Extracted structured data from model text response ✓ Distance threshold tuning: Adjusted for FAISS L2 distance (default 0.3) ✓ Album randomization: Added random.shuffle() to prevent deterministic groupings ✓ Error handling: Wrapped API failures to graceful fallbacks MANUAL VS AI-GENERATED (BREAKDOWN): AI-Generated (65%): - Model integration boilerplate (API clients, token management) - FAISS index structure and search logic - Vision model prompt formatting - Default model selections (Qwen3-VL-8B, FLUX.1-dev) Manual Refinements (35%): - Database schema design (Photo, embeddings storage) - FastAPI route structure and error handling - Album clustering algorithm and break conditions - Distance threshold validation and tuning - File upload validation and storage management - CORS middleware configuration KEY TECHNICAL DECISIONS: 1. Distance threshold = 0.3: Filters visually similar photos 2. Model choice: Qwen3-VL for balanced speed/quality 3. FLUX.1-dev: High-quality image generation over speed 4. Random album creation: Ensures different groupings per request 5. HuggingFace Hub: Leveraged pre-tuned models vs training custom FILES MODIFIED FOR IMPROVEMENTS: - search_engine.py: Added randomization + album count control - image_analyzer.py: JSON error handling for vision model output - image_analyzer_2.py: Agentic image analysis with Gemini-2.0-Flash for aesthetic descriptions - text_to_image.py: Timestamp-based filename collision prevention