# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview This is a Hugging Face Space that serves as the **complete backend** for the Piclets Discovery game. It orchestrates AI services, handles Piclet generation, and manages persistent storage. **Core Concept**: Each real-world object has ONE canonical Piclet! Players scan objects with photos, and the server generates Pokemon-style creatures using AI, tracking canonical discoveries and variations (e.g., "velvet pillow" is a variation of the canonical "pillow"). **Architecture Philosophy**: The server handles ALL AI orchestration securely. The frontend is a pure UI that makes a single API call. This prevents client-side manipulation and ensures fair play. ## Architecture ### Storage System - **HuggingFace Dataset**: `Fraser/piclets` (public dataset repository) - **Structure**: ``` piclets/ {normalized_object_name}.json # e.g., pillow.json users/ {username}.json # User profiles metadata/ stats.json # Global statistics leaderboard.json # Top discoverers ``` ### Object Normalization Objects are normalized for consistent storage: - Convert to lowercase - Remove articles (the, a, an) - Handle pluralization (pillows → pillow) - Replace spaces with underscores - Remove special characters Examples: - "The Blue Pillow" → `pillow` - "wooden chairs" → `wooden_chair` - "glasses" → `glass` (special case handling) ### Piclet Data Structure ```json { "canonical": { "objectName": "pillow", "typeId": "pillow_canonical", "discoveredBy": "username", "discoveredAt": "2024-07-26T10:30:00", "scanCount": 42, "picletData": { // Full Piclet instance data } }, "variations": [ { "typeId": "pillow_001", "attributes": ["velvet", "blue"], "discoveredBy": "username2", "discoveredAt": "2024-07-26T11:00:00", "scanCount": 5, "picletData": { // Full variation data } } ] } ``` ## API Endpoints The frontend only needs these **5 public endpoints**: ### 1. **generate_piclet** (Scanner) Complete Piclet generation workflow - the main endpoint. - **Input**: - `image`: User's photo (File) - `hf_token`: User's HuggingFace OAuth token (string) - **Process**: 1. Verifies `hf_token` → gets user info 2. Uses token to connect to **JoyCaption** → generates detailed image description 3. Uses token to call **GPT-OSS-120B** → generates Pokemon concept (object, variation, stats, description) 4. Parses concept to extract structured data 5. Uses token to call **Flux-Schnell** → generates Piclet image 6. Checks dataset for canonical/variation match 7. Saves to dataset with user attribution 8. Updates user profile (discoveries, rarity score) - **Returns**: ```json { "success": true, "piclet": {/* complete Piclet data */}, "discoveryStatus": "new" | "variation" | "existing", "canonicalId": "pillow_canonical", "message": "Congratulations! You discovered the first pillow Piclet!" } ``` - **Security**: Uses user's token to call AI services, consuming THEIR GPU quota (not the server's) ### 2. **get_user_piclets** (User Collection) Get user's discovered Piclets and stats. - **Input**: `hf_token` (string) - **Returns**: ```json { "success": true, "piclets": [{/* list of discoveries */}], "stats": { "username": "...", "totalFinds": 42, "uniqueFinds": 15, "rarityScore": 1250 } } ``` ### 3. **get_object_details** (Object Data) Get complete object information (canonical + all variations). - **Input**: `object_name` (string, e.g., "pillow", "macbook") - **Returns**: ```json { "success": true, "objectName": "pillow", "canonical": {/* canonical data */}, "variations": [{/* variation 1 */}, {/* variation 2 */}], "totalScans": 157, "variationCount": 8 } ``` ### 4. **get_recent_activity** (Activity Feed) Recent discoveries across all users. - **Input**: `limit` (int, default 20) - **Returns**: List of recent discoveries with timestamps ### 5. **get_leaderboard** (Top Users) Top discoverers by rarity score. - **Input**: `limit` (int, default 10) - **Returns**: Ranked users with stats --- **Internal Functions** (not exposed to frontend): - `search_piclet()`, `create_canonical()`, `create_variation()`, `increment_scan_count()` - Used internally by `generate_piclet()` ## Rarity System Scan count determines rarity: - **Legendary**: ≤ 5 scans - **Epic**: 6-20 scans - **Rare**: 21-50 scans - **Uncommon**: 51-100 scans - **Common**: > 100 scans Rarity scoring for leaderboard: - Canonical discovery: +100 points - Variation discovery: +50 points - Additional bonuses based on rarity tier ## Authentication Strategy **Web UI Authentication**: - Gradio `auth` protects web interface from casual access - Requires username="admin" and password from `ADMIN_PASSWORD` env var - Prevents random users from manually creating piclets via UI - **Does NOT affect API access** - programmatic clients bypass this **API-Level Authentication**: - OAuth token verification for user attribution - Tokens verified via `https://huggingface.co/oauth/userinfo` - User profiles keyed by stable HF `sub` (user ID) - All discovery data is public (embracing open discovery) ## Integration with Frontend The frontend (`../piclets/`) uses these **5 simple API calls**: ```javascript // Connect to server const client = await window.gradioClient.Client.connect("Fraser/piclets-server"); // 1. Scanner - Generate complete Piclet (ONE CALL - server does everything!) const scanResult = await client.predict("/generate_piclet", { image: imageFile, hf_token: userToken }); const { success, piclet, discoveryStatus, message } = scanResult.data[0]; // 2. User Collection - Get user's Piclets + stats const myPiclets = await client.predict("/get_user_piclets", { hf_token: userToken }); const { piclets, stats } = myPiclets.data[0]; // 3. Object Details - Get object info (canonical + variations) const objectInfo = await client.predict("/get_object_details", { object_name: "pillow" }); const { canonical, variations, totalScans } = objectInfo.data[0]; // 4. Activity Feed - Get recent discoveries const activity = await client.predict("/get_recent_activity", { limit: 20 }); // 5. Leaderboard - Get top users const leaders = await client.predict("/get_leaderboard", { limit: 10 }); ``` **Why This Design?** - **Clean API**: Only 5 endpoints, each with a clear purpose - **Security**: All AI orchestration happens server-side (can't be manipulated) - **Simplicity**: Frontend is pure UI, no complex orchestration logic - **Fairness**: Uses user's GPU quota, not server's - **Reliability**: Server handles retries and error recovery ## Development ### Local Testing ```bash pip install -r requirements.txt python app.py # Access at http://localhost:7860 ``` ### Deployment Push to HuggingFace Space repository: ```bash git add -A && git commit -m "Update" && git push ``` ### Environment Variables - `HF_TOKEN`: **Required** - HuggingFace write token for dataset operations (set in Space Secrets) - `ADMIN_PASSWORD`: Optional - Password for web UI access (set in Space Secrets) - `DATASET_REPO`: Target dataset (default: "Fraser/piclets") Note: Users' `hf_token` (passed in API calls) is separate from server's `HF_TOKEN` (for dataset writes). ## Key Implementation Details ### AI Service Integration The server uses `gradio_client` to call external AI services with the user's token: - **JoyCaption** (`fancyfeast/joy-caption-alpha-two`): Detailed image captioning with brand/model recognition - **GPT-OSS-120B** (`amd/gpt-oss-120b-chatbot`): Concept generation and parsing - **Flux-Schnell** (`black-forest-labs/FLUX.1-schnell`): Anime-style Piclet image generation Each service is called with the user's `hf_token`, consuming their GPU quota. ### Concept Parsing GPT-OSS generates structured markdown with sections: - Canonical Object (specific brand/model, not generic) - Variation (distinctive attribute or "canonical") - Object Rarity (determines tier) - Monster Name, Type, Stats - Physical Stats (height, weight) - Personality, Description - Monster Image Prompt The parser uses regex to extract each section and clean the data. ### Variation Matching - Uses set intersection to find attribute overlap - 50% match threshold for variations - Attributes are normalized and trimmed ### Caching Strategy - Local cache in `cache/` directory - HuggingFace hub caching for downloads - Temporary files for uploads ### Error Handling - Token verification before any operations - Graceful fallbacks for missing data - Default user profiles for new users - Try-catch blocks around all operations - Detailed logging for debugging ## Future Enhancements 1. **Background Removal**: Add server-side background removal (currently done on frontend) 2. **Activity Log**: Separate timeline file for better performance 3. **Image Storage**: Store Piclet images directly in dataset (currently stores URLs) 4. **Badges/Achievements**: Track discovery milestones 5. **Trading System**: Allow users to trade variations 6. **Seasonal Events**: Time-limited discoveries 7. **Rate Limiting**: Per-user rate limits to prevent abuse 8. **Caching**: Cache AI responses for identical images ## Security Considerations - **Token Verification**: All operations verify HF OAuth tokens via `https://huggingface.co/oauth/userinfo` - **User Attribution**: Discoveries tracked by stable HF `sub` (user ID), not username - **Fair GPU Usage**: Users consume their own GPU quota, not server's - **Public Data**: All discovery data is public by design (embracing open discovery) - **No Client Manipulation**: AI orchestration happens server-side only - **Input Validation**: File uploads and token formats validated - **No Sensitive Data**: No passwords or private info stored - **Future**: Rate limiting per user to prevent abuse