matinsn2000 commited on
Commit
f22d5ac
Β·
1 Parent(s): c2cd7f1

Remmoved album summery

Browse files
AI_USAGE_REPORT.txt CHANGED
@@ -9,15 +9,15 @@ WHERE & HOW AI WAS USED:
9
  - Tool: Qwen/Qwen3-VL-8B-Instruct model via HuggingFace API
10
  - Function: Auto-generate tags, descriptions, and captions for uploaded photos
11
 
12
- 1b. Image Analysis - Aesthetic Descriptions (cloudzy/agents/image_analyzer_2.py)
13
  - Tool: Gemini-2.0-Flash model via smolagents + OpenAI-compatible API
14
  - Function: Generate aesthetic image descriptions for inspiration-based generation
15
 
16
- 2. Text-to-Image Generation (cloudzy/inference_models/text_to_image.py)
17
  - Tool: FLUX.1-dev model via HuggingFace Inference API
18
  - Function: Generate images from text prompts
19
 
20
- 3. Semantic Search (cloudzy/search_engine.py + cloudzy/routes/search.py)
21
  - Tool: FAISS (vector database) with embeddings from Qwen/Qwen3-Embedding-8B (4096-dimensional)
22
  - Function: Find visually similar photos via L2-normalized embedding vectors
23
 
@@ -41,8 +41,6 @@ Search Queries:
41
  MODEL OUTPUTS REFINED:
42
  βœ“ JSON parsing: Extracted structured data from model text response (with dict type-check for Gemini responses)
43
  βœ“ Embedding model upgrade: Migrated from multilingual-e5-large (1024-d) to Qwen3-Embedding-8B (4096-d)
44
- βœ“ L2 normalization: Added unit-vector normalization to embeddings for consistent distance calculations
45
- βœ“ Distance threshold tuning: Adjusted for normalized embeddings (0.5 β†’ 1.0 for search, 0.3 β†’ 1.5 for albums)
46
  βœ“ Album randomization: Added random.shuffle() to prevent deterministic groupings
47
  βœ“ Error handling: Wrapped API failures to graceful fallbacks
48
 
@@ -63,17 +61,8 @@ Manual Refinements (35%):
63
 
64
  KEY TECHNICAL DECISIONS:
65
  1. Embedding model: Qwen3-Embedding-8B (4096-d) for better semantic understanding than smaller models
66
- 2. L2 normalization: Ensures normalized distances (0-2 range) independent of embedding dimension
67
- 3. Distance thresholds: search() ≀ 1.0, create_albums() ≀ 1.5 (optimized for normalized embeddings)
68
  4. Model choice: Qwen3-VL for balanced speed/quality in image analysis
69
  5. FLUX.1-dev: High-quality image generation over speed
70
  6. Random album creation: Ensures different groupings per request
71
  7. HuggingFace Hub: Leveraged pre-tuned models vs training custom
72
-
73
- FILES MODIFIED FOR IMPROVEMENTS:
74
- - ai_utils.py: Added L2 normalization to both generate_embedding() and _embed_text() methods
75
- - search_engine.py: Updated distance thresholds (0.5β†’1.0 search, 0.3β†’1.5 albums) for normalized embeddings
76
- - image_analyzer.py: JSON error handling for vision model output
77
- - image_analyzer_2.py: Dict type-check for Gemini responses + agentic image analysis with Gemini-2.0-Flash
78
- - text_to_image.py: Timestamp-based filename collision prevention
79
-
 
9
  - Tool: Qwen/Qwen3-VL-8B-Instruct model via HuggingFace API
10
  - Function: Auto-generate tags, descriptions, and captions for uploaded photos
11
 
12
+ 2. Image Analysis - Aesthetic Descriptions (cloudzy/agents/image_analyzer_2.py)
13
  - Tool: Gemini-2.0-Flash model via smolagents + OpenAI-compatible API
14
  - Function: Generate aesthetic image descriptions for inspiration-based generation
15
 
16
+ 3. Text-to-Image Generation (cloudzy/inference_models/text_to_image.py)
17
  - Tool: FLUX.1-dev model via HuggingFace Inference API
18
  - Function: Generate images from text prompts
19
 
20
+ 4. Semantic Search (cloudzy/search_engine.py + cloudzy/routes/search.py)
21
  - Tool: FAISS (vector database) with embeddings from Qwen/Qwen3-Embedding-8B (4096-dimensional)
22
  - Function: Find visually similar photos via L2-normalized embedding vectors
23
 
 
41
  MODEL OUTPUTS REFINED:
42
  βœ“ JSON parsing: Extracted structured data from model text response (with dict type-check for Gemini responses)
43
  βœ“ Embedding model upgrade: Migrated from multilingual-e5-large (1024-d) to Qwen3-Embedding-8B (4096-d)
 
 
44
  βœ“ Album randomization: Added random.shuffle() to prevent deterministic groupings
45
  βœ“ Error handling: Wrapped API failures to graceful fallbacks
46
 
 
61
 
62
  KEY TECHNICAL DECISIONS:
63
  1. Embedding model: Qwen3-Embedding-8B (4096-d) for better semantic understanding than smaller models
64
+ 3. Distance thresholds: search() ≀ 1.5, create_albums() ≀ 1.5 (optimized for normalized embeddings)
 
65
  4. Model choice: Qwen3-VL for balanced speed/quality in image analysis
66
  5. FLUX.1-dev: High-quality image generation over speed
67
  6. Random album creation: Ensures different groupings per request
68
  7. HuggingFace Hub: Leveraged pre-tuned models vs training custom
 
 
 
 
 
 
 
 
cloudzy/routes/photo.py CHANGED
@@ -141,18 +141,18 @@ async def get_albums(
141
  )
142
 
143
  # Collect descriptions for album summary
144
- if photo.caption:
145
- album_descriptions.append(photo.caption)
146
- tags = photo.get_tags()
147
- if tags:
148
- album_descriptions.append(" ".join(tags))
149
 
150
  # Generate album summary from compiled descriptions
151
- combined_description = " ".join(album_descriptions)
152
- album_summary = summarizer.summarize(combined_description)
153
 
154
  albums_response.append(
155
- AlbumItem(album_summary=album_summary, album=album_photos)
156
  )
157
 
158
  return albums_response
 
141
  )
142
 
143
  # Collect descriptions for album summary
144
+ # if photo.caption:
145
+ # album_descriptions.append(photo.caption)
146
+ # tags = photo.get_tags()
147
+ # if tags:
148
+ # album_descriptions.append(" ".join(tags))
149
 
150
  # Generate album summary from compiled descriptions
151
+ # combined_description = " ".join(album_descriptions)
152
+ # album_summary = summarizer.summarize(combined_description)
153
 
154
  albums_response.append(
155
+ AlbumItem( album=album_photos)
156
  )
157
 
158
  return albums_response
cloudzy/schemas.py CHANGED
@@ -66,7 +66,7 @@ class PhotoItem(BaseModel):
66
  distance: float
67
 
68
  class AlbumItem(BaseModel):
69
- album_summary: str
70
  album: List[PhotoItem]
71
 
72
  AlbumsResponse = List[AlbumItem]
 
66
  distance: float
67
 
68
  class AlbumItem(BaseModel):
69
+ # album_summary: str
70
  album: List[PhotoItem]
71
 
72
  AlbumsResponse = List[AlbumItem]