Spaces:

userx2000
/

cloudzy_ai_challenge

Sleeping

App Files Files Community

matinsn2000 commited on Oct 27

Commit

f22d5ac

1 Parent(s): c2cd7f1

Remmoved album summery

Browse files

Files changed (3) hide show

AI_USAGE_REPORT.txt +4 -15
cloudzy/routes/photo.py +8 -8
cloudzy/schemas.py +1 -1

AI_USAGE_REPORT.txt CHANGED Viewed

@@ -9,15 +9,15 @@ WHERE & HOW AI WAS USED:
    - Tool: Qwen/Qwen3-VL-8B-Instruct model via HuggingFace API
    - Function: Auto-generate tags, descriptions, and captions for uploaded photos
-1b. Image Analysis - Aesthetic Descriptions (cloudzy/agents/image_analyzer_2.py)
    - Tool: Gemini-2.0-Flash model via smolagents + OpenAI-compatible API
    - Function: Generate aesthetic image descriptions for inspiration-based generation
-2. Text-to-Image Generation (cloudzy/inference_models/text_to_image.py)
    - Tool: FLUX.1-dev model via HuggingFace Inference API
    - Function: Generate images from text prompts
-3. Semantic Search (cloudzy/search_engine.py + cloudzy/routes/search.py)
    - Tool: FAISS (vector database) with embeddings from Qwen/Qwen3-Embedding-8B (4096-dimensional)
    - Function: Find visually similar photos via L2-normalized embedding vectors
@@ -41,8 +41,6 @@ Search Queries:
 MODEL OUTPUTS REFINED:
 ✓ JSON parsing: Extracted structured data from model text response (with dict type-check for Gemini responses)
 ✓ Embedding model upgrade: Migrated from multilingual-e5-large (1024-d) to Qwen3-Embedding-8B (4096-d)
-✓ L2 normalization: Added unit-vector normalization to embeddings for consistent distance calculations
-✓ Distance threshold tuning: Adjusted for normalized embeddings (0.5 → 1.0 for search, 0.3 → 1.5 for albums)
 ✓ Album randomization: Added random.shuffle() to prevent deterministic groupings
 ✓ Error handling: Wrapped API failures to graceful fallbacks
@@ -63,17 +61,8 @@ Manual Refinements (35%):
 KEY TECHNICAL DECISIONS:
 1. Embedding model: Qwen3-Embedding-8B (4096-d) for better semantic understanding than smaller models
-2. L2 normalization: Ensures normalized distances (0-2 range) independent of embedding dimension
-3. Distance thresholds: search() ≤ 1.0, create_albums() ≤ 1.5 (optimized for normalized embeddings)
 4. Model choice: Qwen3-VL for balanced speed/quality in image analysis
 5. FLUX.1-dev: High-quality image generation over speed
 6. Random album creation: Ensures different groupings per request
 7. HuggingFace Hub: Leveraged pre-tuned models vs training custom
-FILES MODIFIED FOR IMPROVEMENTS:
-- ai_utils.py: Added L2 normalization to both generate_embedding() and _embed_text() methods
-- search_engine.py: Updated distance thresholds (0.5→1.0 search, 0.3→1.5 albums) for normalized embeddings
-- image_analyzer.py: JSON error handling for vision model output
-- image_analyzer_2.py: Dict type-check for Gemini responses + agentic image analysis with Gemini-2.0-Flash
-- text_to_image.py: Timestamp-based filename collision prevention

    - Tool: Qwen/Qwen3-VL-8B-Instruct model via HuggingFace API
    - Function: Auto-generate tags, descriptions, and captions for uploaded photos
+2. Image Analysis - Aesthetic Descriptions (cloudzy/agents/image_analyzer_2.py)
    - Tool: Gemini-2.0-Flash model via smolagents + OpenAI-compatible API
    - Function: Generate aesthetic image descriptions for inspiration-based generation
+3. Text-to-Image Generation (cloudzy/inference_models/text_to_image.py)
    - Tool: FLUX.1-dev model via HuggingFace Inference API
    - Function: Generate images from text prompts
+4. Semantic Search (cloudzy/search_engine.py + cloudzy/routes/search.py)
    - Tool: FAISS (vector database) with embeddings from Qwen/Qwen3-Embedding-8B (4096-dimensional)
    - Function: Find visually similar photos via L2-normalized embedding vectors
 MODEL OUTPUTS REFINED:
 ✓ JSON parsing: Extracted structured data from model text response (with dict type-check for Gemini responses)
 ✓ Embedding model upgrade: Migrated from multilingual-e5-large (1024-d) to Qwen3-Embedding-8B (4096-d)
 ✓ Album randomization: Added random.shuffle() to prevent deterministic groupings
 ✓ Error handling: Wrapped API failures to graceful fallbacks
 KEY TECHNICAL DECISIONS:
 1. Embedding model: Qwen3-Embedding-8B (4096-d) for better semantic understanding than smaller models
+3. Distance thresholds: search() ≤ 1.5, create_albums() ≤ 1.5 (optimized for normalized embeddings)
 4. Model choice: Qwen3-VL for balanced speed/quality in image analysis
 5. FLUX.1-dev: High-quality image generation over speed
 6. Random album creation: Ensures different groupings per request
 7. HuggingFace Hub: Leveraged pre-tuned models vs training custom

cloudzy/routes/photo.py CHANGED Viewed

@@ -141,18 +141,18 @@ async def get_albums(
             )
             # Collect descriptions for album summary
-            if photo.caption:
-                album_descriptions.append(photo.caption)
-            tags = photo.get_tags()
-            if tags:
-                album_descriptions.append(" ".join(tags))
         # Generate album summary from compiled descriptions
-        combined_description = " ".join(album_descriptions)
-        album_summary = summarizer.summarize(combined_description)
         albums_response.append(
-            AlbumItem(album_summary=album_summary, album=album_photos)
         )
     return albums_response

             )
             # Collect descriptions for album summary
+            # if photo.caption:
+            #     album_descriptions.append(photo.caption)
+            # tags = photo.get_tags()
+            # if tags:
+            #     album_descriptions.append(" ".join(tags))
         # Generate album summary from compiled descriptions
+        # combined_description = " ".join(album_descriptions)
+        # album_summary = summarizer.summarize(combined_description)
         albums_response.append(
+            AlbumItem( album=album_photos)
         )
     return albums_response

cloudzy/schemas.py CHANGED Viewed

@@ -66,7 +66,7 @@ class PhotoItem(BaseModel):
     distance: float
 class AlbumItem(BaseModel):
-    album_summary: str
     album: List[PhotoItem]
 AlbumsResponse = List[AlbumItem]

     distance: float
 class AlbumItem(BaseModel):
+    # album_summary: str
     album: List[PhotoItem]
 AlbumsResponse = List[AlbumItem]