Flash Attention 3 compatible with `torch.compile`. See [this PR](https://github.com/Dao-AILab/flash-attention/pull/1769) by guilhermeleobas for more details. There is a build here for Torch 2.8.0 and a build for Torch Nightlies from 08/30 onward. Reproduce: ## Torch 2.8.0 Build Compiled from `https://github.com/varunneal/flash-attention` on branch `guilhermeleobas/fa3-compile`. Compilation commands: ``` pip install -U pip wheel setuptools ninja numpy packaging psutil pip install torch==2.8.0 git clone https://github.com/varunneal/flash-attention cd flash-attention/hopper git switch fa3-compile export MAX_JOBS=32 export FLASH_ATTENTION_FORCE_BUILD=TRUE # skip prebuilt wheel fetch export FLASH_ATTENTION_DISABLE_SM80=TRUE # Hopper-only export FLASH_ATTENTION_DISABLE_FP16=TRUE # leave BF16, FP8 # Optional, for faster compilation time export FLASH_ATTENTION_DISABLE_HDIM64=TRUE export FLASH_ATTENTION_DISABLE_HDIM96=TRUE export FLASH_ATTENTION_DISABLE_HDIM192=TRUE export FLASH_ATTENTION_DISABLE_HDIM256=TRUE python setup.py bdist_wheel ``` ## Torch Nightlies build Compiled from `https://github.com/varunneal/flash-attention` on branch `stable`. This is a custom fork that combines [ABI Compatibility](https://github.com/Dao-AILab/flash-attention/pull/1791) with `torch.compile` compatbility. This build should be consistent with Torch Nightlies from 08/30 onward. Compilation commands: ``` pip install -U pip wheel setuptools ninja numpy packaging psutil # Any Torch Nightly after 08/30 should be alright pip install --pre "torch==2.10.0.dev20250926+cu126" --index-url https://download.pytorch.org/whl/nightly/cu126 git clone https://github.com/varunneal/flash-attention cd flash-attention/hopper git switch stable export MAX_JOBS=32 export FLASH_ATTENTION_FORCE_BUILD=TRUE # skip prebuilt wheel fetch export FLASH_ATTENTION_DISABLE_SM80=TRUE # Hopper-only export FLASH_ATTENTION_DISABLE_FP16=TRUE # leave BF16, FP8 python setup.py bdist_wheel ``` ## Tips for ARM builds On an aarch64/ARM64 system, such as a GH200 server, building requires a bit of finesse. Try: ``` export FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE" export MAX_JOBS=4 ``` Please contact me if you would like me to build wheels for any other version of python or torch.