matinsn2000 commited on
Commit
ab19ad9
·
1 Parent(s): 6c9b9e1

Updated AI_USAGE_REPORT file

Browse files
AI_USAGE_REPORT.txt CHANGED
@@ -1,136 +1,73 @@
1
- ================================================================================
2
- AI USAGE REPORT (SUMMARY)
3
- Cloudzy AI Challenge - Photo Album Management System
4
- ================================================================================
5
 
6
- PROJECT OVERVIEW
7
- ================
8
- AI-enhanced photo management system with semantic search, album summarization,
9
- and image generation capabilities.
10
 
11
- ================================================================================
12
- AI MODELS USED
13
- ================================================================================
14
-
15
- 1. IMAGE EMBEDDING: intfloat/multilingual-e5-large
16
- - Location: cloudzy/ai_utils.py (ImageEmbeddingGenerator)
17
- - Purpose: Convert photo metadata into 1024-d vectors for similarity search
18
- - Used in: Photo upload, semantic search, album clustering
19
-
20
- 2. SUMMARIZATION: facebook/bart-large-cnn
21
- - Location: cloudzy/ai_utils.py (TextSummarizer)
22
- - Purpose: Generate summaries of photo clusters
23
- - Used in: /albums endpoint (creates album descriptions)
24
-
25
- 3. IMAGE ANALYSIS: Google Gemini 2.0-flash
26
- - Location: cloudzy/agents/image_analyzer_2.py (ImageAnalyzerAgent)
27
- - Purpose: Analyze images and generate detailed descriptions
28
- - Used in: /generate-similar-image endpoint (Step 1)
29
-
30
- 4. IMAGE GENERATION: black-forest-labs/FLUX.1-dev
31
- - Location: cloudzy/inference_models/text_to_image.py (TextToImageGenerator)
32
- - Purpose: Generate high-quality images from text prompts
33
- - Used in: /generate-similar-image endpoint (Step 3)
34
-
35
- ================================================================================
36
- MANUAL VS AI-GENERATED
37
- ================================================================================
38
-
39
- MANUAL WORK (100% Developer-Written):
40
- ✓ Database schema, API routes, file management
41
- ✓ FastAPI application setup and middleware
42
- ✓ Error handling and validation logic
43
- ✓ File upload service and utilities
44
- ✓ FAISS vector search implementation
45
-
46
- HYBRID (Manual Integration + AI Models):
47
- ✓ ImageEmbeddingGenerator: Text → 1024-d embeddings (AI model)
48
- ✓ TextSummarizer: Metadata → album summary (AI model)
49
- ✓ ImageAnalyzerAgent: Image → description (AI model)
50
- ✓ TextToImageGenerator: Prompt → generated image (AI model)
51
-
52
- AI-GENERATED CONTENT:
53
- ✓ Embedding vectors (semantic representations)
54
- ✓ Album summaries (cluster descriptions)
55
- ✓ Image descriptions (visual analysis)
56
- ✓ Generated images (from text prompts)
57
-
58
- ================================================================================
59
- NEW ENDPOINT: /generate-similar-image
60
- ================================================================================
61
-
62
- Endpoint: POST /generate-similar-image
63
- Location: cloudzy/routes/generate.py
64
-
65
- Workflow:
66
- 1. User uploads an image
67
- 2. ImageAnalyzerAgent analyzes it → gets description
68
- 3. TextToImageGenerator creates new image from description
69
- 4. Returns generated image URL + description
70
-
71
- Response:
72
- {
73
- "description": "Detailed image analysis from Gemini",
74
- "generated_image_url": "http://127.0.0.1:8000/uploads/generated_20241025_123456_789.png",
75
- "message": "Similar image generated successfully"
76
- }
77
-
78
- Performance: ~40-75 seconds per request
79
- - Image analysis: ~5-10s (Gemini)
80
- - Image generation: ~30-60s (FLUX.1-dev)
81
-
82
- ================================================================================
83
- PROMPTS USED IN PROJECT
84
- ================================================================================
85
-
86
- 1. IMAGE ANALYSIS PROMPT (ImageAnalyzerAgent - Gemini):
87
- Location: cloudzy/agents/image_analyzer_2.py
88
 
89
- "Describe this image in a way that could be used as a prompt for generating
90
- a new image inspired by it. Focus on the main subjects, composition, style,
91
- mood, and colors. Avoid mentioning specific names or exact details — instead,
92
- describe the overall aesthetic and atmosphere so the result feels similar but
93
- not identical."
94
-
95
- 2. IMAGE GENERATION PROMPT:
96
- - Input: Description from ImageAnalyzerAgent (above)
97
- - Model: FLUX.1-dev (black-forest-labs/FLUX.1-dev)
98
- - Location: cloudzy/inference_models/text_to_image.py
99
- - Strategy: Direct prompt passing to image generation model
100
-
101
- ================================================================================
102
- ENVIRONMENT VARIABLES
103
- ================================================================================
104
-
105
- Required:
106
- - HF_TOKEN: Hugging Face API key (embeddings, summarization)
107
- - GEMINI_API_KEY: Google Gemini API key (image analysis)
108
- - HF_TOKEN_1: Alternative HF token (image generation)
109
- - APP_DOMAIN: App URL (default: http://127.0.0.1:8000/)
110
-
111
- ================================================================================
112
- KEY DECISIONS
113
- ================================================================================
114
-
115
- 1. Used FLUX.1-dev for high-quality image generation (vs Stable Diffusion)
116
- 2. Composable pipeline: ImageAnalyzer TextToImageGenerator (reusable components)
117
- 3. Graceful error handling with fallbacks when APIs unavailable
118
- 4. Temporary file handling: saves uploads locally for Gemini analysis
119
-
120
- ================================================================================
121
- SUMMARY
122
- ================================================================================
123
-
124
- This project integrates 4 AI models responsibly:
125
- - Embeddings for semantic search
126
- - Summarization for album descriptions
127
- - Vision AI for image analysis
128
- - Generative AI for image creation
129
-
130
- All manual work handles infrastructure, logic, validation, and error handling.
131
- AI models are called for their specialized tasks only.
132
-
133
- New capability: Generate creative image variations from uploaded photos using
134
- intelligent analysis + high-quality generation pipeline.
135
-
136
- ================================================================================
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ AI USAGE REPORT - Cloudzy AI Challenge
2
+ ========================================
 
 
3
 
4
+ PROJECT OVERVIEW:
5
+ FastAPI-based photo management system with semantic search, AI image analysis, and text-to-image generation.
 
 
6
 
7
+ WHERE & HOW AI WAS USED:
8
+ 1. Image Analysis - Structured Metadata (cloudzy/agents/image_analyzer.py)
9
+ - Tool: Qwen/Qwen3-VL-8B-Instruct model via HuggingFace API
10
+ - Function: Auto-generate tags, descriptions, and captions for uploaded photos
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
+ 1b. Image Analysis - Aesthetic Descriptions (cloudzy/agents/image_analyzer_2.py)
13
+ - Tool: Gemini-2.0-Flash model via smolagents + OpenAI-compatible API
14
+ - Function: Generate aesthetic image descriptions for inspiration-based generation
15
+
16
+ 2. Text-to-Image Generation (cloudzy/inference_models/text_to_image.py)
17
+ - Tool: FLUX.1-dev model via HuggingFace Inference API
18
+ - Function: Generate images from text prompts
19
+
20
+ 3. Semantic Search (cloudzy/search_engine.py + cloudzy/routes/search.py)
21
+ - Tool: FAISS (vector database) with embeddings
22
+ - Function: Find visually similar photos via embedding vectors
23
+
24
+ PROMPTS & MODEL INPUTS:
25
+ Image Analysis Prompt #1 - Structured Metadata (image_analyzer.py):
26
+ "Describe this image in the following exact format: result: {tags: [...], description: '...', caption: '...'}"
27
+ - Input: Image URL sent to vision model
28
+ - Model ingests structured format request to ensure JSON output
29
+
30
+ Image Analysis Prompt #2 - Generative Inspiration (image_analyzer_2.py, Gemini via smolagents):
31
+ "Describe this image in a way that could be used as a prompt for generating a new image inspired by it.
32
+ Focus on the main subjects, composition, style, mood, and colors.
33
+ Avoid mentioning specific names or exact details — instead, describe the overall aesthetic and atmosphere so the result feels similar but not identical."
34
+ - Input: Local image file sent to Gemini-2.0-Flash model
35
+ - Designed for generating aesthetic descriptions usable as prompts for image generation
36
+
37
+ Search Queries:
38
+ - User text converted to embeddings matched against photo database
39
+ - Album creation: Groups similar photos by distance threshold (randomized each call)
40
+
41
+ MODEL OUTPUTS REFINED:
42
+ ✓ JSON parsing: Extracted structured data from model text response
43
+ ✓ Distance threshold tuning: Adjusted for FAISS L2 distance (default 0.3)
44
+ ✓ Album randomization: Added random.shuffle() to prevent deterministic groupings
45
+ ✓ Error handling: Wrapped API failures to graceful fallbacks
46
+
47
+ MANUAL VS AI-GENERATED (BREAKDOWN):
48
+ AI-Generated (65%):
49
+ - Model integration boilerplate (API clients, token management)
50
+ - FAISS index structure and search logic
51
+ - Vision model prompt formatting
52
+ - Default model selections (Qwen3-VL-8B, FLUX.1-dev)
53
+
54
+ Manual Refinements (35%):
55
+ - Database schema design (Photo, embeddings storage)
56
+ - FastAPI route structure and error handling
57
+ - Album clustering algorithm and break conditions
58
+ - Distance threshold validation and tuning
59
+ - File upload validation and storage management
60
+ - CORS middleware configuration
61
+
62
+ KEY TECHNICAL DECISIONS:
63
+ 1. Distance threshold = 0.3: Filters visually similar photos
64
+ 2. Model choice: Qwen3-VL for balanced speed/quality
65
+ 3. FLUX.1-dev: High-quality image generation over speed
66
+ 4. Random album creation: Ensures different groupings per request
67
+ 5. HuggingFace Hub: Leveraged pre-tuned models vs training custom
68
+
69
+ FILES MODIFIED FOR IMPROVEMENTS:
70
+ - search_engine.py: Added randomization + album count control
71
+ - image_analyzer.py: JSON error handling for vision model output
72
+ - image_analyzer_2.py: Agentic image analysis with Gemini-2.0-Flash for aesthetic descriptions
73
+ - text_to_image.py: Timestamp-based filename collision prevention
cloudzy/agents/image_analyzer_2.py CHANGED
@@ -97,12 +97,7 @@ result: {
97
 
98
  response = self.agent.run(prompt, images=[image])
99
 
100
- # Handle both dict and string responses
101
- if isinstance(response, dict):
102
- # Response is already a dictionary
103
- return response
104
-
105
- # If response is a string, extract JSON part
106
  # Look for the pattern: result: { ... }
107
  match = re.search(r'result:\s*(\{[\s\S]*\})', response)
108
 
@@ -131,7 +126,7 @@ if __name__ == "__main__":
131
  agent = ImageAnalyzerAgent()
132
 
133
  # Test with first sample image
134
- result = agent.analyze_image_metadata(sample_image_paths[0])
135
  print(f"\n=== Results ===")
136
  print(f"Description: {result}")
137
  # print(f"Similar images found: {len(result['similar_images'])}")
 
97
 
98
  response = self.agent.run(prompt, images=[image])
99
 
100
+ # Extract JSON part from response
 
 
 
 
 
101
  # Look for the pattern: result: { ... }
102
  match = re.search(r'result:\s*(\{[\s\S]*\})', response)
103
 
 
126
  agent = ImageAnalyzerAgent()
127
 
128
  # Test with first sample image
129
+ result = agent.retrieve_similar_images(sample_image_paths[0])
130
  print(f"\n=== Results ===")
131
  print(f"Description: {result}")
132
  # print(f"Similar images found: {len(result['similar_images'])}")
cloudzy/routes/photo.py CHANGED
@@ -37,7 +37,7 @@ async def get_photo(
37
  image_url = f"{APP_DOMAIN}uploads/{photo.filename}",
38
  tags=photo.get_tags(),
39
  caption=photo.caption,
40
- embedding=photo.get_embedding(),
41
  created_at=photo.created_at,
42
  )
43
 
@@ -72,7 +72,7 @@ async def list_photos(
72
  image_url = f"{APP_DOMAIN}uploads/{photo.filename}",
73
  tags=photo.get_tags(),
74
  caption=photo.caption,
75
- embedding=photo.get_embedding(),
76
  created_at=photo.created_at,
77
  )
78
  for photo in photos
 
37
  image_url = f"{APP_DOMAIN}uploads/{photo.filename}",
38
  tags=photo.get_tags(),
39
  caption=photo.caption,
40
+ # embedding=photo.get_embedding(),
41
  created_at=photo.created_at,
42
  )
43
 
 
72
  image_url = f"{APP_DOMAIN}uploads/{photo.filename}",
73
  tags=photo.get_tags(),
74
  caption=photo.caption,
75
+ # embedding=photo.get_embedding(),
76
  created_at=photo.created_at,
77
  )
78
  for photo in photos
cloudzy/schemas.py CHANGED
@@ -19,7 +19,8 @@ class PhotoResponse(BaseModel):
19
 
20
  class PhotoDetailResponse(PhotoResponse):
21
  """Detailed photo response with embedding info"""
22
- embedding: Optional[List[float]] = None
 
23
 
24
 
25
 
 
19
 
20
  class PhotoDetailResponse(PhotoResponse):
21
  """Detailed photo response with embedding info"""
22
+ # embedding: Optional[List[float]] = None
23
+ pass
24
 
25
 
26
 
cloudzy/search_engine.py CHANGED
@@ -163,41 +163,4 @@ class SearchEngine:
163
  "index_type": type(self.index).__name__,
164
  }
165
 
166
- def debug_distances(self, sample_size: int = 3) -> dict:
167
- """Debug distances between photos to understand why albums aren't grouping"""
168
- from cloudzy.database import SessionLocal
169
- from cloudzy.models import Photo
170
- from sqlmodel import select
171
-
172
- self.load()
173
- if self.index.ntotal == 0:
174
- return {"error": "No embeddings in index"}
175
-
176
- id_map = self.index.id_map
177
- all_ids = [id_map.at(i) for i in range(min(id_map.size(), sample_size))]
178
-
179
- debug_info = {}
180
- session = SessionLocal()
181
- try:
182
- for photo_id in all_ids:
183
- photo = session.exec(select(Photo).where(Photo.id == photo_id)).first()
184
- if not photo:
185
- continue
186
-
187
- embedding = photo.get_embedding()
188
- if not embedding:
189
- continue
190
-
191
- query_embedding = np.array(embedding).reshape(1, -1).astype(np.float32)
192
- distances, ids = self.index.search(query_embedding, 5)
193
-
194
- debug_info[photo_id] = {
195
- "top_5_results": [
196
- {"id": int(pid), "distance": float(d)}
197
- for pid, d in zip(ids[0], distances[0]) if pid != -1
198
- ]
199
- }
200
- finally:
201
- session.close()
202
-
203
- return debug_info
 
163
  "index_type": type(self.index).__name__,
164
  }
165
 
166
+