Spaces:
Running
Running
Commit
Β·
f22d5ac
1
Parent(s):
c2cd7f1
Remmoved album summery
Browse files- AI_USAGE_REPORT.txt +4 -15
- cloudzy/routes/photo.py +8 -8
- cloudzy/schemas.py +1 -1
AI_USAGE_REPORT.txt
CHANGED
|
@@ -9,15 +9,15 @@ WHERE & HOW AI WAS USED:
|
|
| 9 |
- Tool: Qwen/Qwen3-VL-8B-Instruct model via HuggingFace API
|
| 10 |
- Function: Auto-generate tags, descriptions, and captions for uploaded photos
|
| 11 |
|
| 12 |
-
|
| 13 |
- Tool: Gemini-2.0-Flash model via smolagents + OpenAI-compatible API
|
| 14 |
- Function: Generate aesthetic image descriptions for inspiration-based generation
|
| 15 |
|
| 16 |
-
|
| 17 |
- Tool: FLUX.1-dev model via HuggingFace Inference API
|
| 18 |
- Function: Generate images from text prompts
|
| 19 |
|
| 20 |
-
|
| 21 |
- Tool: FAISS (vector database) with embeddings from Qwen/Qwen3-Embedding-8B (4096-dimensional)
|
| 22 |
- Function: Find visually similar photos via L2-normalized embedding vectors
|
| 23 |
|
|
@@ -41,8 +41,6 @@ Search Queries:
|
|
| 41 |
MODEL OUTPUTS REFINED:
|
| 42 |
β JSON parsing: Extracted structured data from model text response (with dict type-check for Gemini responses)
|
| 43 |
β Embedding model upgrade: Migrated from multilingual-e5-large (1024-d) to Qwen3-Embedding-8B (4096-d)
|
| 44 |
-
β L2 normalization: Added unit-vector normalization to embeddings for consistent distance calculations
|
| 45 |
-
β Distance threshold tuning: Adjusted for normalized embeddings (0.5 β 1.0 for search, 0.3 β 1.5 for albums)
|
| 46 |
β Album randomization: Added random.shuffle() to prevent deterministic groupings
|
| 47 |
β Error handling: Wrapped API failures to graceful fallbacks
|
| 48 |
|
|
@@ -63,17 +61,8 @@ Manual Refinements (35%):
|
|
| 63 |
|
| 64 |
KEY TECHNICAL DECISIONS:
|
| 65 |
1. Embedding model: Qwen3-Embedding-8B (4096-d) for better semantic understanding than smaller models
|
| 66 |
-
|
| 67 |
-
3. Distance thresholds: search() β€ 1.0, create_albums() β€ 1.5 (optimized for normalized embeddings)
|
| 68 |
4. Model choice: Qwen3-VL for balanced speed/quality in image analysis
|
| 69 |
5. FLUX.1-dev: High-quality image generation over speed
|
| 70 |
6. Random album creation: Ensures different groupings per request
|
| 71 |
7. HuggingFace Hub: Leveraged pre-tuned models vs training custom
|
| 72 |
-
|
| 73 |
-
FILES MODIFIED FOR IMPROVEMENTS:
|
| 74 |
-
- ai_utils.py: Added L2 normalization to both generate_embedding() and _embed_text() methods
|
| 75 |
-
- search_engine.py: Updated distance thresholds (0.5β1.0 search, 0.3β1.5 albums) for normalized embeddings
|
| 76 |
-
- image_analyzer.py: JSON error handling for vision model output
|
| 77 |
-
- image_analyzer_2.py: Dict type-check for Gemini responses + agentic image analysis with Gemini-2.0-Flash
|
| 78 |
-
- text_to_image.py: Timestamp-based filename collision prevention
|
| 79 |
-
|
|
|
|
| 9 |
- Tool: Qwen/Qwen3-VL-8B-Instruct model via HuggingFace API
|
| 10 |
- Function: Auto-generate tags, descriptions, and captions for uploaded photos
|
| 11 |
|
| 12 |
+
2. Image Analysis - Aesthetic Descriptions (cloudzy/agents/image_analyzer_2.py)
|
| 13 |
- Tool: Gemini-2.0-Flash model via smolagents + OpenAI-compatible API
|
| 14 |
- Function: Generate aesthetic image descriptions for inspiration-based generation
|
| 15 |
|
| 16 |
+
3. Text-to-Image Generation (cloudzy/inference_models/text_to_image.py)
|
| 17 |
- Tool: FLUX.1-dev model via HuggingFace Inference API
|
| 18 |
- Function: Generate images from text prompts
|
| 19 |
|
| 20 |
+
4. Semantic Search (cloudzy/search_engine.py + cloudzy/routes/search.py)
|
| 21 |
- Tool: FAISS (vector database) with embeddings from Qwen/Qwen3-Embedding-8B (4096-dimensional)
|
| 22 |
- Function: Find visually similar photos via L2-normalized embedding vectors
|
| 23 |
|
|
|
|
| 41 |
MODEL OUTPUTS REFINED:
|
| 42 |
β JSON parsing: Extracted structured data from model text response (with dict type-check for Gemini responses)
|
| 43 |
β Embedding model upgrade: Migrated from multilingual-e5-large (1024-d) to Qwen3-Embedding-8B (4096-d)
|
|
|
|
|
|
|
| 44 |
β Album randomization: Added random.shuffle() to prevent deterministic groupings
|
| 45 |
β Error handling: Wrapped API failures to graceful fallbacks
|
| 46 |
|
|
|
|
| 61 |
|
| 62 |
KEY TECHNICAL DECISIONS:
|
| 63 |
1. Embedding model: Qwen3-Embedding-8B (4096-d) for better semantic understanding than smaller models
|
| 64 |
+
3. Distance thresholds: search() β€ 1.5, create_albums() β€ 1.5 (optimized for normalized embeddings)
|
|
|
|
| 65 |
4. Model choice: Qwen3-VL for balanced speed/quality in image analysis
|
| 66 |
5. FLUX.1-dev: High-quality image generation over speed
|
| 67 |
6. Random album creation: Ensures different groupings per request
|
| 68 |
7. HuggingFace Hub: Leveraged pre-tuned models vs training custom
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
cloudzy/routes/photo.py
CHANGED
|
@@ -141,18 +141,18 @@ async def get_albums(
|
|
| 141 |
)
|
| 142 |
|
| 143 |
# Collect descriptions for album summary
|
| 144 |
-
if photo.caption:
|
| 145 |
-
|
| 146 |
-
tags = photo.get_tags()
|
| 147 |
-
if tags:
|
| 148 |
-
|
| 149 |
|
| 150 |
# Generate album summary from compiled descriptions
|
| 151 |
-
combined_description = " ".join(album_descriptions)
|
| 152 |
-
album_summary = summarizer.summarize(combined_description)
|
| 153 |
|
| 154 |
albums_response.append(
|
| 155 |
-
AlbumItem(
|
| 156 |
)
|
| 157 |
|
| 158 |
return albums_response
|
|
|
|
| 141 |
)
|
| 142 |
|
| 143 |
# Collect descriptions for album summary
|
| 144 |
+
# if photo.caption:
|
| 145 |
+
# album_descriptions.append(photo.caption)
|
| 146 |
+
# tags = photo.get_tags()
|
| 147 |
+
# if tags:
|
| 148 |
+
# album_descriptions.append(" ".join(tags))
|
| 149 |
|
| 150 |
# Generate album summary from compiled descriptions
|
| 151 |
+
# combined_description = " ".join(album_descriptions)
|
| 152 |
+
# album_summary = summarizer.summarize(combined_description)
|
| 153 |
|
| 154 |
albums_response.append(
|
| 155 |
+
AlbumItem( album=album_photos)
|
| 156 |
)
|
| 157 |
|
| 158 |
return albums_response
|
cloudzy/schemas.py
CHANGED
|
@@ -66,7 +66,7 @@ class PhotoItem(BaseModel):
|
|
| 66 |
distance: float
|
| 67 |
|
| 68 |
class AlbumItem(BaseModel):
|
| 69 |
-
album_summary: str
|
| 70 |
album: List[PhotoItem]
|
| 71 |
|
| 72 |
AlbumsResponse = List[AlbumItem]
|
|
|
|
| 66 |
distance: float
|
| 67 |
|
| 68 |
class AlbumItem(BaseModel):
|
| 69 |
+
# album_summary: str
|
| 70 |
album: List[PhotoItem]
|
| 71 |
|
| 72 |
AlbumsResponse = List[AlbumItem]
|