All Benchmarks Aggregated Report
Layer Norm
| Implementation | Description | 
|---|---|
| HF Kernels Layer Norm | HuggingFace kernels implementation | 
| PyTorch Layer Norm | PyTorch native implementation | 
Rotary Position Embeddings
| Implementation | Description | 
|---|---|
| HF Kernels Rotary | HuggingFace kernels implementation | 
| PyTorch Rotary | PyTorch native implementation | 
Flash Attention
| Implementation | Description | 
|---|---|
| Flash Attention | Flash Attention implementation | 
| HF Kernels Flash Attention | HuggingFace kernels Flash Attention | 
| HF Kernels Flash Attention 3 | HuggingFace kernels Flash Attention 3 | 
| Memory Efficient Attention | Memory efficient attention implementation | 
| Sage Attention | Sage attention implementation | 
| xFormers | xFormers attention implementation | 
Causal Conv1D
| Implementation | Description | 
|---|---|
| HF Kernels Causal Conv1D | HuggingFace kernels implementation | 
| PyTorch Causal Conv1D | PyTorch native implementation | 
Activation
| Implementation | Description | 
|---|---|
| HF Kernels SwiGLU | HuggingFace kernels SwiGLU implementation | 
| PyTorch SwiGLU | PyTorch native SwiGLU implementation | 
ReLU
| Implementation | Description | 
|---|---|
| HF Kernels ReLU | HuggingFace kernels ReLU implementation | 
| PyTorch ReLU | PyTorch native ReLU implementation |