File size: 14,255 Bytes
9ebdc51 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 |
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Repository Overview
This is a Hugging Face Space that provides a GPU-accelerated JupyterLab environment for training and simulating robots using the MuJoCo physics engine. The space covers a wide range of robotics applications including locomotion, manipulation, motion tracking, and general physics simulation. It is designed to run in a Docker container with NVIDIA GPU support for hardware-accelerated physics rendering.
## What This Environment Supports
This is a general-purpose MuJoCo training environment with sample notebooks covering:
1. **General MuJoCo Physics** (`tutorial.ipynb`) - Comprehensive introduction to MuJoCo fundamentals including basic rendering, simulation loops, contacts, friction, tendons, actuators, sensors, and advanced rendering techniques
2. **Locomotion** (`locomotion.ipynb`) - Training quadrupedal and bipedal robots for walking, running, and acrobatic behaviors. Includes environments for Unitree Go1/G1, Boston Dynamics Spot, Google Barkour, Berkeley Humanoid, Unitree H1, and more
3. **Manipulation** (`manipulation.ipynb`) - Robot arm and dexterous hand control. Includes Franka Emika Panda pick-and-place tasks and Leap Hand dexterous manipulation with asymmetric actor-critic training
4. **Motion Tracking** (`opentrack.ipynb`) - Humanoid motion tracking and retargeting using the OpenTrack system with motion capture data
## Architecture
### Container Environment
- **Base Image**: nvidia/cuda:12.8.1-devel-ubuntu22.04
- **Python**: 3.13 (Miniconda)
- **GPU Rendering**: Uses EGL (OpenGL for headless rendering) with NVIDIA drivers
- **Web Server**: JupyterLab on port 7860
### Key Components
1. **GPU Initialization** (`init_gpu.py`): Validates GPU setup before starting JupyterLab
- Checks NVIDIA driver accessibility via `nvidia-smi`
- Verifies EGL library availability (libEGL.so.1, libGL.so.1, libEGL_nvidia.so.0)
- Tests EGL device initialization with multiple fallback methods (platform device, default display, surfaceless)
- Validates MuJoCo rendering at multiple resolutions (64x64, 240x320, 480x640)
- Critical environment variables: `MUJOCO_GL=egl`, `PYOPENGL_PLATFORM=egl`, `EGL_PLATFORM=surfaceless`
2. **MuJoCo Playground Setup** (`init_mujoco.py`): Downloads MuJoCo model assets
- Imports `mujoco_playground` which automatically clones the mujoco_menagerie repository
- This repository contains robot models (quadrupeds, bipeds, arms, hands, etc.)
3. **Server Startup** (`start_server.sh`): Container entrypoint
- Sets up NVIDIA EGL library symlinks at runtime (searches /usr/local/nvidia/lib64, /usr/local/cuda/lib64, /usr/lib/nvidia)
- Runs GPU validation (`python init_gpu.py`)
- Downloads MuJoCo assets (`python init_mujoco.py`)
- Disables JupyterLab announcements
- Launches JupyterLab with iframe embedding support for Hugging Face Spaces
### Sample Notebooks
Sample notebooks are organized in individual folders within `samples/` and are automatically copied to `/data/workspaces/` at container startup:
- **`samples/tutorial/`** - Complete MuJoCo introduction (2258 lines) covering physics fundamentals, rendering, contacts, actuators, sensors, tendons, and camera control
- **`samples/locomotion/`** - Quadrupedal and bipedal locomotion training (1762 lines) with PPO, domain randomization, curriculum learning, and policy fine-tuning
- **`samples/manipulation/`** - Robot manipulation (649 lines) including pick-and-place (Panda arm) and dexterous manipulation (Leap Hand) with asymmetric actor-critic
- **`samples/opentrack/`** - Humanoid motion tracking/retargeting (603 lines) including dataset download, training, checkpoint conversion, and video generation
Each sample is copied to its own workspace directory (`/data/workspaces/<sample_name>/`) at runtime. Notebooks are only copied if they don't already exist, preserving any user modifications.
## Development Commands
### Running Locally with Docker
```bash
# Build the container
docker build -t mujoco-training .
# Run with GPU support
docker run --gpus all -p 7860:7860 mujoco-training
```
### Testing GPU Setup
```bash
# Validate GPU rendering capabilities (run inside container)
python init_gpu.py
# Check NVIDIA driver
nvidia-smi
# Test EGL libraries
ldconfig -p | grep EGL
```
### JupyterLab Access
- Default port: 7860
- Default token: "huggingface" (set via `JUPYTER_TOKEN` environment variable)
- Default landing page: `/lab/tree/workspaces/locomotion/locomotion.ipynb`
- Notebook working directory: `/data` (when deployed as Hugging Face Space)
### Persistent Storage and Workspaces
When deployed on Hugging Face Spaces, the `/data` directory is backed by persistent storage. At container startup, `start_server.sh` automatically:
1. Creates `/data/workspaces/` if it doesn't exist
2. For each sample in `samples/`, creates `/data/workspaces/<sample_name>/` if it doesn't exist
3. Copies the `.ipynb` file only if it doesn't already exist in the workspace (preserving user modifications)
4. Copies any additional files from the sample directory (datasets, scripts, etc.)
This ensures:
- User modifications to notebooks are preserved across container restarts
- Each sample has its own isolated workspace for generated data, models, and outputs
- Sample notebooks can include supporting files that are copied to the workspace
- Users can create additional workspaces in `/data/workspaces/` for their own projects
## Critical EGL Configuration
The container requires specific EGL configuration for headless GPU rendering:
1. **NVIDIA EGL Vendor Config**: Created at `/usr/share/glvnd/egl_vendor.d/10_nvidia.json` pointing to `libEGL_nvidia.so.0`
2. **Library Path**: `LD_LIBRARY_PATH` includes `/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/x86_64-linux-gnu:/usr/local/cuda/lib64`
3. **Runtime Symlinks**: `start_server.sh` creates symlinks to `libEGL_nvidia.so.0` from mounted NVIDIA directories
4. **Environment Variables**: `__EGL_VENDOR_LIBRARY_DIRS=/usr/share/glvnd/egl_vendor.d`
### Troubleshooting EGL Issues
If MuJoCo rendering fails:
1. Verify NVIDIA drivers: `nvidia-smi` should show GPU info
2. Check EGL vendor config: `cat /usr/share/glvnd/egl_vendor.d/10_nvidia.json`
3. Verify library loading: `ldconfig -p | grep EGL`
4. Run comprehensive diagnostic: `python init_gpu.py`
5. Check that `MUJOCO_GL=egl` is set: `echo $MUJOCO_GL`
## Training Workflows
### General MuJoCo Simulation (tutorial.ipynb)
Basic simulation loop:
```python
import mujoco
model = mujoco.MjModel.from_xml_string(xml)
data = mujoco.MjData(model)
# Simulation loop
mujoco.mj_resetData(model, data)
while data.time < duration:
mujoco.mj_step(model, data)
# Read sensors, apply controls, etc.
```
Rendering:
```python
with mujoco.Renderer(model, height, width) as renderer:
mujoco.mj_forward(model, data)
renderer.update_scene(data, camera="camera_name")
pixels = renderer.render()
```
### Locomotion Training (locomotion.ipynb)
Typical workflow using Brax + MuJoCo Playground:
1. **Load environment**: `env = registry.load(env_name)`
2. **Get config**: `env_cfg = registry.get_default_config(env_name)`
3. **Configure PPO**: `ppo_params = locomotion_params.brax_ppo_config(env_name)`
4. **Apply domain randomization**: `randomizer = registry.get_domain_randomizer(env_name)`
5. **Train**: Use `brax.training.agents.ppo.train` with the environment and randomization function
6. **Save checkpoints**: Policies saved to `checkpoints/{env_name}/{step}/`
7. **Fine-tune**: Restore from checkpoint and continue training with modified config
Available environments:
- **Quadrupedal**: Go1JoystickFlatTerrain, Go1JoystickRoughTerrain, Go1Getup, Go1Handstand, Go1Footstand, SpotFlatTerrainJoystick, SpotGetup, SpotJoystickGaitTracking, BarkourJoystick
- **Bipedal**: BerkeleyHumanoidJoystickFlatTerrain, BerkeleyHumanoidJoystickRoughTerrain, G1JoystickFlatTerrain, G1JoystickRoughTerrain, H1InplaceGaitTracking, H1JoystickGaitTracking, Op3Joystick, T1JoystickFlatTerrain, T1JoystickRoughTerrain
Full list: `registry.locomotion.ALL_ENVS`
Key training techniques:
- **Domain Randomization**: Randomizes friction, armature, center of mass, link masses for sim-to-real transfer
- **Energy Penalties**: `energy_termination_threshold`, `reward_config.energy`, `reward_config.dof_acc` to control power consumption and smoothness
- **Curriculum Learning**: Fine-tune from checkpoints with progressively modified reward configs
- **Asymmetric Actor-Critic**: Actor receives proprioception, critic receives privileged simulation state
### Manipulation Training (manipulation.ipynb)
Similar to locomotion but focuses on:
- **Pick-and-place tasks**: PandaPickCubeOrientation (trains in ~3 minutes on RTX 4090)
- **Dexterous manipulation**: LeapCubeReorient (trains in ~33 minutes on RTX 4090)
- **Asymmetric observations**: Use `policy_obs_key` and `value_obs_key` in PPO params to train actor on sensor-like data while critic gets privileged state
Available environments: `registry.manipulation.ALL_ENVS`
### Motion Tracking (opentrack.ipynb)
OpenTrack workflow for humanoid motion tracking:
1. **Clone repository**: `git clone https://github.com/GalaxyGeneralRobotics/OpenTrack.git`
2. **Download mocap data**: From `huggingface.co/datasets/robfiras/loco-mujoco-datasets` (Lafan1/UnitreeG1)
3. **Train policy**: `python train_policy.py --exp_name debug --terrain_type flat_terrain`
4. **Convert checkpoint**: `python brax2torch.py --exp_name <exp_name>` (Brax β PyTorch)
5. **Generate videos**: `python play_policy.py --exp_name <exp_name> --use_renderer`
## Python Dependencies
Core stack (see `requirements.txt`):
- **JupyterLab**: 4.4.3 (with tornado 6.2 for compatibility)
- **JAX**: CUDA 12 support via `jax[cuda12]`
- **MuJoCo**: 3.3+ with MuJoCo MJX (JAX-based physics)
- **Brax**: JAX-based RL framework for massively parallel training
- **MuJoCo Playground**: Collection of robot environments and training utilities
- **Supporting libraries**: mediapy (video rendering), ipywidgets, nvidia-cusparse-cu12
## File Structure
```
/
βββ Dockerfile # Container with CUDA 12.8 + EGL setup
βββ start_server.sh # Container entrypoint
βββ init_gpu.py # GPU validation script (comprehensive EGL tests)
βββ init_mujoco.py # MuJoCo Playground asset downloader
βββ requirements.txt # Python dependencies
βββ packages.txt # System packages (currently empty)
βββ on_startup.sh # Custom startup commands (placeholder)
βββ login.html # Custom JupyterLab login page
βββ samples/ # Example notebooks (organized by topic)
βββ tutorial/
β βββ tutorial.ipynb # MuJoCo fundamentals (2258 lines)
βββ locomotion/
β βββ locomotion.ipynb # Robot locomotion (1762 lines)
βββ manipulation/
β βββ manipulation.ipynb # Robot manipulation (649 lines)
βββ opentrack/
βββ opentrack.ipynb # Motion tracking (603 lines)
```
When deployed as a Hugging Face Space with persistent storage:
```
/data/ # Persistent storage volume (mounted at runtime)
βββ workspaces/ # Sample workspaces (created by start_server.sh)
βββ tutorial/
β βββ tutorial.ipynb # Copied from samples/, preserves user edits
β βββ ... # User-generated data, models, outputs
βββ locomotion/
β βββ locomotion.ipynb
β βββ checkpoints/ # Training checkpoints
β βββ ...
βββ manipulation/
β βββ manipulation.ipynb
β βββ ...
βββ opentrack/
βββ opentrack.ipynb
βββ datasets/ # Downloaded mocap data
βββ models/ # Trained models
βββ videos/ # Generated videos
```
## Performance Notes
- **Physics simulation**: Can achieve 50,000+ Hz on single GPU with JAX/MJX (much faster than rendering)
- **Rendering**: Typically 30-60 Hz, much slower than physics
- **Training times** (on RTX 4090 / L40S):
- Simple manipulation: 3 minutes
- Quadrupedal joystick: 7 minutes
- Bipedal locomotion: 17 minutes
- Dexterous manipulation: 33 minutes
- **Brax parallelization**: Uses thousands of parallel environments for fast training
- **Checkpointing**: Critical for curriculum learning and fine-tuning
## Common Patterns
### Visualization Options
```python
scene_option = mujoco.MjvOption()
scene_option.flags[mujoco.mjtVisFlag.mjVIS_JOINT] = True # Show joints
scene_option.flags[mujoco.mjtVisFlag.mjVIS_CONTACTPOINT] = True # Show contacts
scene_option.flags[mujoco.mjtVisFlag.mjVIS_CONTACTFORCE] = True # Show forces
scene_option.flags[mujoco.mjtVisFlag.mjVIS_TRANSPARENT] = True # Transparency
scene_option.flags[mujoco.mjtVisFlag.mjVIS_PERTFORCE] = True # Show perturbations
```
### Named Access Pattern
```python
# Instead of using indices
model.geom_rgba[geom_id, :]
# Use named access
model.geom('green_sphere').rgba
data.geom('box').xpos
data.joint('swing').qpos
data.sensor('accelerometer').data
```
### Rendering Modes
- **RGB rendering**: `renderer.render()` - returns pixels
- **Depth rendering**: `renderer.enable_depth_rendering()` then `renderer.render()`
- **Segmentation**: `renderer.enable_segmentation_rendering()` - returns object IDs and types
## Important Notes
- This is designed for Hugging Face Spaces with GPU instances (NVIDIA L40S or similar)
- All training uses JAX/Brax for massive parallelization across thousands of environments
- Policies are typically saved using Orbax checkpointing for fine-tuning
- Domain randomization is critical for sim-to-real transfer
- The environment supports multiple RL algorithms (PPO, SAC) through Brax
- Asymmetric actor-critic (different observations for policy and value function) is commonly used
|