Spaces:

jbilcke-hf
/

train-robots-with-mujoco

Paused

File size: 14,255 Bytes

9ebdc51

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Repository Overview

This is a Hugging Face Space that provides a GPU-accelerated JupyterLab environment for training and simulating robots using the MuJoCo physics engine. The space covers a wide range of robotics applications including locomotion, manipulation, motion tracking, and general physics simulation. It is designed to run in a Docker container with NVIDIA GPU support for hardware-accelerated physics rendering.

## What This Environment Supports

This is a general-purpose MuJoCo training environment with sample notebooks covering:

1. **General MuJoCo Physics** (`tutorial.ipynb`) - Comprehensive introduction to MuJoCo fundamentals including basic rendering, simulation loops, contacts, friction, tendons, actuators, sensors, and advanced rendering techniques

2. **Locomotion** (`locomotion.ipynb`) - Training quadrupedal and bipedal robots for walking, running, and acrobatic behaviors. Includes environments for Unitree Go1/G1, Boston Dynamics Spot, Google Barkour, Berkeley Humanoid, Unitree H1, and more

3. **Manipulation** (`manipulation.ipynb`) - Robot arm and dexterous hand control. Includes Franka Emika Panda pick-and-place tasks and Leap Hand dexterous manipulation with asymmetric actor-critic training

4. **Motion Tracking** (`opentrack.ipynb`) - Humanoid motion tracking and retargeting using the OpenTrack system with motion capture data

## Architecture

### Container Environment
- **Base Image**: nvidia/cuda:12.8.1-devel-ubuntu22.04
- **Python**: 3.13 (Miniconda)
- **GPU Rendering**: Uses EGL (OpenGL for headless rendering) with NVIDIA drivers
- **Web Server**: JupyterLab on port 7860

### Key Components

1. **GPU Initialization** (`init_gpu.py`): Validates GPU setup before starting JupyterLab
   - Checks NVIDIA driver accessibility via `nvidia-smi`
   - Verifies EGL library availability (libEGL.so.1, libGL.so.1, libEGL_nvidia.so.0)
   - Tests EGL device initialization with multiple fallback methods (platform device, default display, surfaceless)
   - Validates MuJoCo rendering at multiple resolutions (64x64, 240x320, 480x640)
   - Critical environment variables: `MUJOCO_GL=egl`, `PYOPENGL_PLATFORM=egl`, `EGL_PLATFORM=surfaceless`

2. **MuJoCo Playground Setup** (`init_mujoco.py`): Downloads MuJoCo model assets
   - Imports `mujoco_playground` which automatically clones the mujoco_menagerie repository
   - This repository contains robot models (quadrupeds, bipeds, arms, hands, etc.)

3. **Server Startup** (`start_server.sh`): Container entrypoint
   - Sets up NVIDIA EGL library symlinks at runtime (searches /usr/local/nvidia/lib64, /usr/local/cuda/lib64, /usr/lib/nvidia)
   - Runs GPU validation (`python init_gpu.py`)
   - Downloads MuJoCo assets (`python init_mujoco.py`)
   - Disables JupyterLab announcements
   - Launches JupyterLab with iframe embedding support for Hugging Face Spaces

### Sample Notebooks

Sample notebooks are organized in individual folders within `samples/` and are automatically copied to `/data/workspaces/` at container startup:

- **`samples/tutorial/`** - Complete MuJoCo introduction (2258 lines) covering physics fundamentals, rendering, contacts, actuators, sensors, tendons, and camera control
- **`samples/locomotion/`** - Quadrupedal and bipedal locomotion training (1762 lines) with PPO, domain randomization, curriculum learning, and policy fine-tuning
- **`samples/manipulation/`** - Robot manipulation (649 lines) including pick-and-place (Panda arm) and dexterous manipulation (Leap Hand) with asymmetric actor-critic
- **`samples/opentrack/`** - Humanoid motion tracking/retargeting (603 lines) including dataset download, training, checkpoint conversion, and video generation

Each sample is copied to its own workspace directory (`/data/workspaces/<sample_name>/`) at runtime. Notebooks are only copied if they don't already exist, preserving any user modifications.

## Development Commands

### Running Locally with Docker

```bash
# Build the container
docker build -t mujoco-training .

# Run with GPU support
docker run --gpus all -p 7860:7860 mujoco-training
```

### Testing GPU Setup

```bash
# Validate GPU rendering capabilities (run inside container)
python init_gpu.py

# Check NVIDIA driver
nvidia-smi

# Test EGL libraries
ldconfig -p | grep EGL
```

### JupyterLab Access

- Default port: 7860
- Default token: "huggingface" (set via `JUPYTER_TOKEN` environment variable)
- Default landing page: `/lab/tree/workspaces/locomotion/locomotion.ipynb`
- Notebook working directory: `/data` (when deployed as Hugging Face Space)

### Persistent Storage and Workspaces

When deployed on Hugging Face Spaces, the `/data` directory is backed by persistent storage. At container startup, `start_server.sh` automatically:

1. Creates `/data/workspaces/` if it doesn't exist
2. For each sample in `samples/`, creates `/data/workspaces/<sample_name>/` if it doesn't exist
3. Copies the `.ipynb` file only if it doesn't already exist in the workspace (preserving user modifications)
4. Copies any additional files from the sample directory (datasets, scripts, etc.)

This ensures:
- User modifications to notebooks are preserved across container restarts
- Each sample has its own isolated workspace for generated data, models, and outputs
- Sample notebooks can include supporting files that are copied to the workspace
- Users can create additional workspaces in `/data/workspaces/` for their own projects

## Critical EGL Configuration

The container requires specific EGL configuration for headless GPU rendering:

1. **NVIDIA EGL Vendor Config**: Created at `/usr/share/glvnd/egl_vendor.d/10_nvidia.json` pointing to `libEGL_nvidia.so.0`
2. **Library Path**: `LD_LIBRARY_PATH` includes `/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/x86_64-linux-gnu:/usr/local/cuda/lib64`
3. **Runtime Symlinks**: `start_server.sh` creates symlinks to `libEGL_nvidia.so.0` from mounted NVIDIA directories
4. **Environment Variables**: `__EGL_VENDOR_LIBRARY_DIRS=/usr/share/glvnd/egl_vendor.d`

### Troubleshooting EGL Issues

If MuJoCo rendering fails:
1. Verify NVIDIA drivers: `nvidia-smi` should show GPU info
2. Check EGL vendor config: `cat /usr/share/glvnd/egl_vendor.d/10_nvidia.json`
3. Verify library loading: `ldconfig -p | grep EGL`
4. Run comprehensive diagnostic: `python init_gpu.py`
5. Check that `MUJOCO_GL=egl` is set: `echo $MUJOCO_GL`

## Training Workflows

### General MuJoCo Simulation (tutorial.ipynb)

Basic simulation loop:
```python
import mujoco
model = mujoco.MjModel.from_xml_string(xml)
data = mujoco.MjData(model)

# Simulation loop
mujoco.mj_resetData(model, data)
while data.time < duration:
    mujoco.mj_step(model, data)
    # Read sensors, apply controls, etc.
```

Rendering:
```python
with mujoco.Renderer(model, height, width) as renderer:
    mujoco.mj_forward(model, data)
    renderer.update_scene(data, camera="camera_name")
    pixels = renderer.render()
```

### Locomotion Training (locomotion.ipynb)

Typical workflow using Brax + MuJoCo Playground:

1. **Load environment**: `env = registry.load(env_name)`
2. **Get config**: `env_cfg = registry.get_default_config(env_name)`
3. **Configure PPO**: `ppo_params = locomotion_params.brax_ppo_config(env_name)`
4. **Apply domain randomization**: `randomizer = registry.get_domain_randomizer(env_name)`
5. **Train**: Use `brax.training.agents.ppo.train` with the environment and randomization function
6. **Save checkpoints**: Policies saved to `checkpoints/{env_name}/{step}/`
7. **Fine-tune**: Restore from checkpoint and continue training with modified config

Available environments:
- **Quadrupedal**: Go1JoystickFlatTerrain, Go1JoystickRoughTerrain, Go1Getup, Go1Handstand, Go1Footstand, SpotFlatTerrainJoystick, SpotGetup, SpotJoystickGaitTracking, BarkourJoystick
- **Bipedal**: BerkeleyHumanoidJoystickFlatTerrain, BerkeleyHumanoidJoystickRoughTerrain, G1JoystickFlatTerrain, G1JoystickRoughTerrain, H1InplaceGaitTracking, H1JoystickGaitTracking, Op3Joystick, T1JoystickFlatTerrain, T1JoystickRoughTerrain

Full list: `registry.locomotion.ALL_ENVS`

Key training techniques:
- **Domain Randomization**: Randomizes friction, armature, center of mass, link masses for sim-to-real transfer
- **Energy Penalties**: `energy_termination_threshold`, `reward_config.energy`, `reward_config.dof_acc` to control power consumption and smoothness
- **Curriculum Learning**: Fine-tune from checkpoints with progressively modified reward configs
- **Asymmetric Actor-Critic**: Actor receives proprioception, critic receives privileged simulation state

### Manipulation Training (manipulation.ipynb)

Similar to locomotion but focuses on:
- **Pick-and-place tasks**: PandaPickCubeOrientation (trains in ~3 minutes on RTX 4090)
- **Dexterous manipulation**: LeapCubeReorient (trains in ~33 minutes on RTX 4090)
- **Asymmetric observations**: Use `policy_obs_key` and `value_obs_key` in PPO params to train actor on sensor-like data while critic gets privileged state

Available environments: `registry.manipulation.ALL_ENVS`

### Motion Tracking (opentrack.ipynb)

OpenTrack workflow for humanoid motion tracking:
1. **Clone repository**: `git clone https://github.com/GalaxyGeneralRobotics/OpenTrack.git`
2. **Download mocap data**: From `huggingface.co/datasets/robfiras/loco-mujoco-datasets` (Lafan1/UnitreeG1)
3. **Train policy**: `python train_policy.py --exp_name debug --terrain_type flat_terrain`
4. **Convert checkpoint**: `python brax2torch.py --exp_name <exp_name>` (Brax → PyTorch)
5. **Generate videos**: `python play_policy.py --exp_name <exp_name> --use_renderer`

## Python Dependencies

Core stack (see `requirements.txt`):
- **JupyterLab**: 4.4.3 (with tornado 6.2 for compatibility)
- **JAX**: CUDA 12 support via `jax[cuda12]`
- **MuJoCo**: 3.3+ with MuJoCo MJX (JAX-based physics)
- **Brax**: JAX-based RL framework for massively parallel training
- **MuJoCo Playground**: Collection of robot environments and training utilities
- **Supporting libraries**: mediapy (video rendering), ipywidgets, nvidia-cusparse-cu12

## File Structure

```
/
├── Dockerfile                      # Container with CUDA 12.8 + EGL setup
├── start_server.sh                 # Container entrypoint
├── init_gpu.py                     # GPU validation script (comprehensive EGL tests)
├── init_mujoco.py                  # MuJoCo Playground asset downloader
├── requirements.txt                # Python dependencies
├── packages.txt                    # System packages (currently empty)
├── on_startup.sh                   # Custom startup commands (placeholder)
├── login.html                      # Custom JupyterLab login page
└── samples/                        # Example notebooks (organized by topic)
    ├── tutorial/
    │   └── tutorial.ipynb          # MuJoCo fundamentals (2258 lines)
    ├── locomotion/
    │   └── locomotion.ipynb        # Robot locomotion (1762 lines)
    ├── manipulation/
    │   └── manipulation.ipynb      # Robot manipulation (649 lines)
    └── opentrack/
        └── opentrack.ipynb         # Motion tracking (603 lines)
```

When deployed as a Hugging Face Space with persistent storage:
```
/data/                              # Persistent storage volume (mounted at runtime)
└── workspaces/                     # Sample workspaces (created by start_server.sh)
    ├── tutorial/
    │   ├── tutorial.ipynb          # Copied from samples/, preserves user edits
    │   └── ...                     # User-generated data, models, outputs
    ├── locomotion/
    │   ├── locomotion.ipynb
    │   ├── checkpoints/            # Training checkpoints
    │   └── ...
    ├── manipulation/
    │   ├── manipulation.ipynb
    │   └── ...
    └── opentrack/
        ├── opentrack.ipynb
        ├── datasets/               # Downloaded mocap data
        ├── models/                 # Trained models
        └── videos/                 # Generated videos
```

## Performance Notes

- **Physics simulation**: Can achieve 50,000+ Hz on single GPU with JAX/MJX (much faster than rendering)
- **Rendering**: Typically 30-60 Hz, much slower than physics
- **Training times** (on RTX 4090 / L40S):
  - Simple manipulation: 3 minutes
  - Quadrupedal joystick: 7 minutes
  - Bipedal locomotion: 17 minutes
  - Dexterous manipulation: 33 minutes
- **Brax parallelization**: Uses thousands of parallel environments for fast training
- **Checkpointing**: Critical for curriculum learning and fine-tuning

## Common Patterns

### Visualization Options

```python
scene_option = mujoco.MjvOption()
scene_option.flags[mujoco.mjtVisFlag.mjVIS_JOINT] = True           # Show joints
scene_option.flags[mujoco.mjtVisFlag.mjVIS_CONTACTPOINT] = True   # Show contacts
scene_option.flags[mujoco.mjtVisFlag.mjVIS_CONTACTFORCE] = True   # Show forces
scene_option.flags[mujoco.mjtVisFlag.mjVIS_TRANSPARENT] = True    # Transparency
scene_option.flags[mujoco.mjtVisFlag.mjVIS_PERTFORCE] = True      # Show perturbations
```

### Named Access Pattern

```python
# Instead of using indices
model.geom_rgba[geom_id, :]

# Use named access
model.geom('green_sphere').rgba
data.geom('box').xpos
data.joint('swing').qpos
data.sensor('accelerometer').data
```

### Rendering Modes

- **RGB rendering**: `renderer.render()` - returns pixels
- **Depth rendering**: `renderer.enable_depth_rendering()` then `renderer.render()`
- **Segmentation**: `renderer.enable_segmentation_rendering()` - returns object IDs and types

## Important Notes

- This is designed for Hugging Face Spaces with GPU instances (NVIDIA L40S or similar)
- All training uses JAX/Brax for massive parallelization across thousands of environments
- Policies are typically saved using Orbax checkpointing for fine-tuning
- Domain randomization is critical for sim-to-real transfer
- The environment supports multiple RL algorithms (PPO, SAC) through Brax
- Asymmetric actor-critic (different observations for policy and value function) is commonly used