Spaces:

userx2000
/

cloudzy_ai_challenge

Running

App Files Files Community

cloudzy_ai_challenge / AI_USAGE_REPORT.txt

matinsn2000

Remmoved album summery

f22d5ac about 4 hours ago

raw

history blame contribute delete

3.61 kB

	AI USAGE REPORT - Cloudzy AI Challenge
	========================================

	PROJECT OVERVIEW:
	FastAPI-based photo management system with semantic search, AI image analysis, and text-to-image generation.

	WHERE & HOW AI WAS USED:
	1. Image Analysis - Structured Metadata (cloudzy/agents/image_analyzer.py)
	- Tool: Qwen/Qwen3-VL-8B-Instruct model via HuggingFace API
	- Function: Auto-generate tags, descriptions, and captions for uploaded photos

	2. Image Analysis - Aesthetic Descriptions (cloudzy/agents/image_analyzer_2.py)
	- Tool: Gemini-2.0-Flash model via smolagents + OpenAI-compatible API
	- Function: Generate aesthetic image descriptions for inspiration-based generation

	3. Text-to-Image Generation (cloudzy/inference_models/text_to_image.py)
	- Tool: FLUX.1-dev model via HuggingFace Inference API
	- Function: Generate images from text prompts

	4. Semantic Search (cloudzy/search_engine.py + cloudzy/routes/search.py)
	- Tool: FAISS (vector database) with embeddings from Qwen/Qwen3-Embedding-8B (4096-dimensional)
	- Function: Find visually similar photos via L2-normalized embedding vectors

	PROMPTS & MODEL INPUTS:
	Image Analysis Prompt #1 - Structured Metadata (image_analyzer.py):
	"Describe this image in the following exact format: result: {tags: [...], description: '...', caption: '...'}"
	- Input: Image URL sent to vision model
	- Model ingests structured format request to ensure JSON output

	Image Analysis Prompt #2 - Generative Inspiration (image_analyzer_2.py, Gemini via smolagents):
	"Describe this image in a way that could be used as a prompt for generating a new image inspired by it.
	Focus on the main subjects, composition, style, mood, and colors.
	Avoid mentioning specific names or exact details — instead, describe the overall aesthetic and atmosphere so the result feels similar but not identical."
	- Input: Local image file sent to Gemini-2.0-Flash model
	- Designed for generating aesthetic descriptions usable as prompts for image generation

	Search Queries:
	- User text → converted to embeddings → matched against photo database
	- Album creation: Groups similar photos by distance threshold (randomized each call)

	MODEL OUTPUTS REFINED:
	✓ JSON parsing: Extracted structured data from model text response (with dict type-check for Gemini responses)
	✓ Embedding model upgrade: Migrated from multilingual-e5-large (1024-d) to Qwen3-Embedding-8B (4096-d)
	✓ Album randomization: Added random.shuffle() to prevent deterministic groupings
	✓ Error handling: Wrapped API failures to graceful fallbacks

	MANUAL VS AI-GENERATED (BREAKDOWN):
	AI-Generated (65%):
	- Model integration boilerplate (API clients, token management)
	- FAISS index structure and search logic
	- Vision model prompt formatting
	- Default model selections (Qwen3-VL-8B, FLUX.1-dev)

	Manual Refinements (35%):
	- Database schema design (Photo, embeddings storage)
	- FastAPI route structure and error handling
	- Album clustering algorithm and break conditions
	- Distance threshold validation and tuning
	- File upload validation and storage management
	- CORS middleware configuration

	KEY TECHNICAL DECISIONS:
	1. Embedding model: Qwen3-Embedding-8B (4096-d) for better semantic understanding than smaller models
	3. Distance thresholds: search() ≤ 1.5, create_albums() ≤ 1.5 (optimized for normalized embeddings)
	4. Model choice: Qwen3-VL for balanced speed/quality in image analysis
	5. FLUX.1-dev: High-quality image generation over speed
	6. Random album creation: Ensures different groupings per request
	7. HuggingFace Hub: Leveraged pre-tuned models vs training custom