Optimum Habana documentation

Optimum for Intel® Gaudi® AI Accelerator

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Optimum for Intel® Gaudi® AI Accelerator

Optimum for Intel Gaudi AI accelerator is the interface between Hugging Face libraries (Transformers, Diffusers, Accelerate,…) and Intel Gaudi AI Accelerators (HPUs). It provides a set of tools that enable easy model loading, training and inference on single- and multi-HPU settings for various downstream tasks as shown in the table below.

The Intel Gaudi AI accelerator family currently includes three product generations: Intel Gaudi 1, Intel Gaudi 2, and Intel Gaudi 3. Each server is equipped with 8 devices, known as Habana Processing Units (HPUs), providing 128GB of memory on Gaudi 3, 96GB on Gaudi 2, and 32GB on the first-gen Gaudi. For more details on the underlying hardware architecture, check out the Gaudi Architecture Overview. Optimum for Intel Gaudi library is fully compatible with all three generations of Gaudi accelerators.

For in-depth examples of running workloads on Gaudi, explore the following blog posts:

The following model architectures, tasks and device distributions have been validated for Optimum for Intel Gaudi:

In the tables below, ✅ means single-card, multi-card and DeepSpeed have all been validated.

  • Transformers:
Architecture Training Inference Tasks
BERT
RoBERTa
ALBERT
DistilBERT
GPT2
BLOOM(Z)
  • DeepSpeed
StarCoder / StarCoder2
  • Single-card
GPT-J
  • DeepSpeed
  • Single card
  • DeepSpeed
GPT-Neo
  • Single card
GPT-NeoX
  • DeepSpeed
  • DeepSpeed
OPT
  • DeepSpeed
Llama 2 / CodeLlama / Llama 3 / Llama Guard / Granite
StableLM
  • Single card
Falcon
  • LoRA
CodeGen
  • Single card
MPT
  • Single card
Mistral
  • Single card
Phi
  • Single card
Mixtral
  • Single card
Persimmon
  • Single card
Qwen2 / Qwen3
  • Single card
  • Single card
Qwen2-MoE
  • Single card
Gemma
  • Single card
Gemma2
Gemma3
XGLM
  • Single card
Cohere
  • Single card
T5 / Flan T5
BART
  • Single card
ViT
Swin
Wav2Vec2
Whisper
SpeechT5
  • Single card
CLIP
BridgeTower
ESMFold
  • Single card
Blip
  • Single card
OWLViT
  • Single card
ClipSeg
  • Single card
Llava / Llava-next / Llava-onevision
  • Single card
idefics2
  • LoRA
  • Single card
Paligemma
  • Single card
Segment Anything Model
  • Single card
VideoMAE
  • Single card
TableTransformer
  • Single card
DETR
  • Single card
Mllama
  • LoRA
MiniCPM3
  • Single card
Baichuan2
  • DeepSpeed
  • Single card
DeepSeek-V2
DeepSeek-V3 / Moonlight
ChatGLM
  • DeepSpeed
  • Single card
Qwen2-VL
  • Single card
Qwen2.5-VL
  • Single card
VideoLLaVA
  • Single card
GLM-4V
  • Single card
Arctic
  • DeepSpeed
GPT-OSS
  • DeepSpeed
  • Diffusers
Architecture Training. Inference Tasks
Stable Diffusion
Stable Diffusion XL
Stable Diffusion Depth2img
  • Single card
Stable Diffusion 3
LDM3D
  • Single card
FLUX.1
  • LoRA
  • Single card
Text to Video
  • Single card
Image to Video
  • Single card
i2vgen-xl
  • Single card
Wan
  • PyTorch Image Models/TIMM:
Architecture Training Inference Tasks
FastViT
  • Single card
  • TRL:
Architecture Training Inference Tasks
Llama 2
Llama 2
Stable Diffusion

Other models and tasks supported by the 🤗 Transformers and 🤗 Diffusers library may also work. You can refer to this section for using them with 🤗 Optimum for Intel Gaudi. In addition, this page explains how to modify any example from the 🤗 Transformers library to make it work with 🤗 Optimum for Intel Gaudi.

Update on GitHub