# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

This is a Hugging Face Space that serves as the **complete backend** for the Piclets Discovery game. It orchestrates AI services, handles Piclet generation, and manages persistent storage.

**Core Concept**: Each real-world object has ONE canonical Piclet! Players scan objects with photos, and the server generates Pokemon-style creatures using AI, tracking canonical discoveries and variations (e.g., "velvet pillow" is a variation of the canonical "pillow").

**Architecture Philosophy**: The server handles ALL AI orchestration securely. The frontend is a pure UI that makes a single API call. This prevents client-side manipulation and ensures fair play.

## Architecture

### Storage System
- **HuggingFace Dataset**: `Fraser/piclets` (public dataset repository)
- **Structure**:
  ```
  piclets/
    {normalized_object_name}.json  # e.g., pillow.json
  users/
    {username}.json                 # User profiles
  metadata/
    stats.json                      # Global statistics
    leaderboard.json               # Top discoverers
  ```

### Object Normalization
Objects are normalized for consistent storage:
- Convert to lowercase
- Remove articles (the, a, an)
- Handle pluralization (pillows → pillow)
- Replace spaces with underscores
- Remove special characters

Examples:
- "The Blue Pillow" → `pillow`
- "wooden chairs" → `wooden_chair`
- "glasses" → `glass` (special case handling)

### Piclet Data Structure
```json
{
  "canonical": {
    "objectName": "pillow",
    "typeId": "pillow_canonical",
    "discoveredBy": "username",
    "discoveredAt": "2024-07-26T10:30:00",
    "scanCount": 42,
    "picletData": {
      // Full Piclet instance data
    }
  },
  "variations": [
    {
      "typeId": "pillow_001",
      "attributes": ["velvet", "blue"],
      "discoveredBy": "username2",
      "discoveredAt": "2024-07-26T11:00:00",
      "scanCount": 5,
      "picletData": {
        // Full variation data
      }
    }
  ]
}
```

## API Endpoints

The frontend only needs these **5 public endpoints**:

### 1. **generate_piclet** (Scanner)
Complete Piclet generation workflow - the main endpoint.

- **Input**:
  - `image`: User's photo (File)
  - `hf_token`: User's HuggingFace OAuth token (string)
- **Process**:
  1. Verifies `hf_token` → gets user info
  2. Uses token to connect to **JoyCaption** → generates detailed image description
  3. Uses token to call **GPT-OSS-120B** → generates Pokemon concept (object, variation, stats, description)
  4. Parses concept to extract structured data
  5. Uses token to call **Flux-Schnell** → generates Piclet image
  6. Checks dataset for canonical/variation match
  7. Saves to dataset with user attribution
  8. Updates user profile (discoveries, rarity score)
- **Returns**:
  ```json
  {
    "success": true,
    "piclet": {/* complete Piclet data */},
    "discoveryStatus": "new" | "variation" | "existing",
    "canonicalId": "pillow_canonical",
    "message": "Congratulations! You discovered the first pillow Piclet!"
  }
  ```
- **Security**: Uses user's token to call AI services, consuming THEIR GPU quota (not the server's)

### 2. **get_user_piclets** (User Collection)
Get user's discovered Piclets and stats.

- **Input**: `hf_token` (string)
- **Returns**:
  ```json
  {
    "success": true,
    "piclets": [{/* list of discoveries */}],
    "stats": {
      "username": "...",
      "totalFinds": 42,
      "uniqueFinds": 15,
      "rarityScore": 1250
    }
  }
  ```

### 3. **get_object_details** (Object Data)
Get complete object information (canonical + all variations).

- **Input**: `object_name` (string, e.g., "pillow", "macbook")
- **Returns**:
  ```json
  {
    "success": true,
    "objectName": "pillow",
    "canonical": {/* canonical data */},
    "variations": [{/* variation 1 */}, {/* variation 2 */}],
    "totalScans": 157,
    "variationCount": 8
  }
  ```

### 4. **get_recent_activity** (Activity Feed)
Recent discoveries across all users.

- **Input**: `limit` (int, default 20)
- **Returns**: List of recent discoveries with timestamps

### 5. **get_leaderboard** (Top Users)
Top discoverers by rarity score.

- **Input**: `limit` (int, default 10)
- **Returns**: Ranked users with stats

---

**Internal Functions** (not exposed to frontend):
- `search_piclet()`, `create_canonical()`, `create_variation()`, `increment_scan_count()` - Used internally by `generate_piclet()`

## Rarity System

Scan count determines rarity:
- **Legendary**: ≤ 5 scans
- **Epic**: 6-20 scans
- **Rare**: 21-50 scans
- **Uncommon**: 51-100 scans
- **Common**: > 100 scans

Rarity scoring for leaderboard:
- Canonical discovery: +100 points
- Variation discovery: +50 points
- Additional bonuses based on rarity tier

## Authentication Strategy

**Web UI Authentication**:
- Gradio `auth` protects web interface from casual access
- Requires username="admin" and password from `ADMIN_PASSWORD` env var
- Prevents random users from manually creating piclets via UI
- **Does NOT affect API access** - programmatic clients bypass this

**API-Level Authentication**:
- OAuth token verification for user attribution
- Tokens verified via `https://huggingface.co/oauth/userinfo`
- User profiles keyed by stable HF `sub` (user ID)
- All discovery data is public (embracing open discovery)

## Integration with Frontend

The frontend (`../piclets/`) uses these **5 simple API calls**:

```javascript
// Connect to server
const client = await window.gradioClient.Client.connect("Fraser/piclets-server");

// 1. Scanner - Generate complete Piclet (ONE CALL - server does everything!)
const scanResult = await client.predict("/generate_piclet", {
  image: imageFile,
  hf_token: userToken
});
const { success, piclet, discoveryStatus, message } = scanResult.data[0];

// 2. User Collection - Get user's Piclets + stats
const myPiclets = await client.predict("/get_user_piclets", {
  hf_token: userToken
});
const { piclets, stats } = myPiclets.data[0];

// 3. Object Details - Get object info (canonical + variations)
const objectInfo = await client.predict("/get_object_details", {
  object_name: "pillow"
});
const { canonical, variations, totalScans } = objectInfo.data[0];

// 4. Activity Feed - Get recent discoveries
const activity = await client.predict("/get_recent_activity", {
  limit: 20
});

// 5. Leaderboard - Get top users
const leaders = await client.predict("/get_leaderboard", {
  limit: 10
});
```

**Why This Design?**
- **Clean API**: Only 5 endpoints, each with a clear purpose
- **Security**: All AI orchestration happens server-side (can't be manipulated)
- **Simplicity**: Frontend is pure UI, no complex orchestration logic
- **Fairness**: Uses user's GPU quota, not server's
- **Reliability**: Server handles retries and error recovery

## Development

### Local Testing
```bash
pip install -r requirements.txt
python app.py
# Access at http://localhost:7860
```

### Deployment
Push to HuggingFace Space repository:
```bash
git add -A && git commit -m "Update" && git push
```

### Environment Variables
- `HF_TOKEN`: **Required** - HuggingFace write token for dataset operations (set in Space Secrets)
- `ADMIN_PASSWORD`: Optional - Password for web UI access (set in Space Secrets)
- `DATASET_REPO`: Target dataset (default: "Fraser/piclets")

Note: Users' `hf_token` (passed in API calls) is separate from server's `HF_TOKEN` (for dataset writes).

## Key Implementation Details

### AI Service Integration
The server uses `gradio_client` to call external AI services with the user's token:

- **JoyCaption** (`fancyfeast/joy-caption-alpha-two`): Detailed image captioning with brand/model recognition
- **GPT-OSS-120B** (`amd/gpt-oss-120b-chatbot`): Concept generation and parsing
- **Flux-Schnell** (`black-forest-labs/FLUX.1-schnell`): Anime-style Piclet image generation

Each service is called with the user's `hf_token`, consuming their GPU quota.

### Concept Parsing
GPT-OSS generates structured markdown with sections:
- Canonical Object (specific brand/model, not generic)
- Variation (distinctive attribute or "canonical")
- Object Rarity (determines tier)
- Monster Name, Type, Stats
- Physical Stats (height, weight)
- Personality, Description
- Monster Image Prompt

The parser uses regex to extract each section and clean the data.

### Variation Matching
- Uses set intersection to find attribute overlap
- 50% match threshold for variations
- Attributes are normalized and trimmed

### Caching Strategy
- Local cache in `cache/` directory
- HuggingFace hub caching for downloads
- Temporary files for uploads

### Error Handling
- Token verification before any operations
- Graceful fallbacks for missing data
- Default user profiles for new users
- Try-catch blocks around all operations
- Detailed logging for debugging

## Future Enhancements

1. **Background Removal**: Add server-side background removal (currently done on frontend)
2. **Activity Log**: Separate timeline file for better performance
3. **Image Storage**: Store Piclet images directly in dataset (currently stores URLs)
4. **Badges/Achievements**: Track discovery milestones
5. **Trading System**: Allow users to trade variations
6. **Seasonal Events**: Time-limited discoveries
7. **Rate Limiting**: Per-user rate limits to prevent abuse
8. **Caching**: Cache AI responses for identical images

## Security Considerations

- **Token Verification**: All operations verify HF OAuth tokens via `https://huggingface.co/oauth/userinfo`
- **User Attribution**: Discoveries tracked by stable HF `sub` (user ID), not username
- **Fair GPU Usage**: Users consume their own GPU quota, not server's
- **Public Data**: All discovery data is public by design (embracing open discovery)
- **No Client Manipulation**: AI orchestration happens server-side only
- **Input Validation**: File uploads and token formats validated
- **No Sensitive Data**: No passwords or private info stored
- **Future**: Rate limiting per user to prevent abuse