Spaces:
Running
Running
| AI USAGE REPORT - Cloudzy AI Challenge | |
| ======================================== | |
| PROJECT OVERVIEW: | |
| FastAPI-based photo management system with semantic search, AI image analysis, and text-to-image generation. | |
| WHERE & HOW AI WAS USED: | |
| 1. Image Analysis - Structured Metadata (cloudzy/agents/image_analyzer.py) | |
| - Tool: Qwen/Qwen3-VL-8B-Instruct model via HuggingFace API | |
| - Function: Auto-generate tags, descriptions, and captions for uploaded photos | |
| 2. Image Analysis - Aesthetic Descriptions (cloudzy/agents/image_analyzer_2.py) | |
| - Tool: Gemini-2.0-Flash model via smolagents + OpenAI-compatible API | |
| - Function: Generate aesthetic image descriptions for inspiration-based generation | |
| 3. Text-to-Image Generation (cloudzy/inference_models/text_to_image.py) | |
| - Tool: FLUX.1-dev model via HuggingFace Inference API | |
| - Function: Generate images from text prompts | |
| 4. Semantic Search (cloudzy/search_engine.py + cloudzy/routes/search.py) | |
| - Tool: FAISS (vector database) with embeddings from Qwen/Qwen3-Embedding-8B (4096-dimensional) | |
| - Function: Find visually similar photos via L2-normalized embedding vectors | |
| PROMPTS & MODEL INPUTS: | |
| Image Analysis Prompt #1 - Structured Metadata (image_analyzer.py): | |
| "Describe this image in the following exact format: result: {tags: [...], description: '...', caption: '...'}" | |
| - Input: Image URL sent to vision model | |
| - Model ingests structured format request to ensure JSON output | |
| Image Analysis Prompt #2 - Generative Inspiration (image_analyzer_2.py, Gemini via smolagents): | |
| "Describe this image in a way that could be used as a prompt for generating a new image inspired by it. | |
| Focus on the main subjects, composition, style, mood, and colors. | |
| Avoid mentioning specific names or exact details β instead, describe the overall aesthetic and atmosphere so the result feels similar but not identical." | |
| - Input: Local image file sent to Gemini-2.0-Flash model | |
| - Designed for generating aesthetic descriptions usable as prompts for image generation | |
| Search Queries: | |
| - User text β converted to embeddings β matched against photo database | |
| - Album creation: Groups similar photos by distance threshold (randomized each call) | |
| MODEL OUTPUTS REFINED: | |
| β JSON parsing: Extracted structured data from model text response (with dict type-check for Gemini responses) | |
| β Embedding model upgrade: Migrated from multilingual-e5-large (1024-d) to Qwen3-Embedding-8B (4096-d) | |
| β Album randomization: Added random.shuffle() to prevent deterministic groupings | |
| β Error handling: Wrapped API failures to graceful fallbacks | |
| MANUAL VS AI-GENERATED (BREAKDOWN): | |
| AI-Generated (65%): | |
| - Model integration boilerplate (API clients, token management) | |
| - FAISS index structure and search logic | |
| - Vision model prompt formatting | |
| - Default model selections (Qwen3-VL-8B, FLUX.1-dev) | |
| Manual Refinements (35%): | |
| - Database schema design (Photo, embeddings storage) | |
| - FastAPI route structure and error handling | |
| - Album clustering algorithm and break conditions | |
| - Distance threshold validation and tuning | |
| - File upload validation and storage management | |
| - CORS middleware configuration | |
| KEY TECHNICAL DECISIONS: | |
| 1. Embedding model: Qwen3-Embedding-8B (4096-d) for better semantic understanding than smaller models | |
| 3. Distance thresholds: search() β€ 1.5, create_albums() β€ 1.5 (optimized for normalized embeddings) | |
| 4. Model choice: Qwen3-VL for balanced speed/quality in image analysis | |
| 5. FLUX.1-dev: High-quality image generation over speed | |
| 6. Random album creation: Ensures different groupings per request | |
| 7. HuggingFace Hub: Leveraged pre-tuned models vs training custom | |