---
base_model:
- ibm-granite/granite-4.0-h-tiny
license: apache-2.0
library_name: transformers
tags:
- language
- unsloth
- granite-4.0
---
> [!NOTE]
>  Includes Unsloth **chat template fixes**! 
 For `llama.cpp`, use `--jinja`
>
Unsloth Dynamic 2.0 achieves superior accuracy & outperforms other leading quants.
| Benchmarks | Metric | Micro Dense | H Micro Dense | H Tiny MoE | H Small MoE | 
|---|---|---|---|---|---|
| General Tasks | |||||
| MMLU | 5-shot | 65.98 | 67.43 | 68.65 | 78.44 | 
| MMLU-Pro | 5-shot, CoT | 44.5 | 43.48 | 44.94 | 55.47 | 
| BBH | 3-shot, CoT | 72.48 | 69.36 | 66.34 | 81.62 | 
| AGI EVAL | 0-shot, CoT | 64.29 | 59 | 62.15 | 70.63 | 
| GPQA | 0-shot, CoT | 30.14 | 32.15 | 32.59 | 40.63 | 
| Alignment Tasks | |||||
| AlpacaEval 2.0 | 29.49 | 31.49 | 30.61 | 42.48 | |
| IFEval | Instruct, Strict | 85.5 | 86.94 | 84.78 | 89.87 | 
| IFEval | Prompt, Strict | 79.12 | 81.71 | 78.1 | 85.22 | 
| IFEval | Average | 82.31 | 84.32 | 81.44 | 87.55 | 
| ArenaHard | 25.84 | 36.15 | 35.75 | 46.48 | |
| Math Tasks | |||||
| GSM8K | 8-shot | 85.45 | 81.35 | 84.69 | 87.27 | 
| GSM8K Symbolic | 8-shot | 79.82 | 77.5 | 81.1 | 87.38 | 
| Minerva Math | 0-shot, CoT | 62.06 | 66.44 | 69.64 | 74 | 
| DeepMind Math | 0-shot, CoT | 44.56 | 43.83 | 49.92 | 59.33 | 
| Code Tasks | |||||
| HumanEval | pass@1 | 80 | 81 | 83 | 88 | 
| HumanEval+ | pass@1 | 72 | 75 | 76 | 83 | 
| MBPP | pass@1 | 72 | 73 | 80 | 84 | 
| MBPP+ | pass@1 | 64 | 64 | 69 | 71 | CRUXEval-O | pass@1 | 41.5 | 41.25 | 39.63 | 50.25 | 
| BigCodeBench | pass@1 | 39.21 | 37.9 | 41.06 | 46.23 | 
| Tool Calling Tasks | |||||
| BFCL v3 | 59.98 | 57.56 | 57.65 | 64.69 | |
| Multilingual Tasks | |||||
| MULTIPLE | pass@1 | 49.21 | 49.46 | 55.83 | 57.37 | 
| MMMLU | 5-shot | 55.14 | 55.19 | 61.87 | 69.69 | 
| INCLUDE | 5-shot | 51.62 | 50.51 | 53.12 | 63.97 | 
| MGSM | 8-shot | 28.56 | 44.48 | 45.36 | 38.72 | 
| Safety | |||||
| SALAD-Bench | 97.06 | 96.28 | 97.77 | 97.3 | |
| AttaQ | 86.05 | 84.44 | 86.61 | 86.64 | |
| Benchmarks | # Langs | Languages | 
|---|---|---|
| MMMLU | 11 | ar, de, en, es, fr, ja, ko, pt, zh, bn, hi | 
| INCLUDE | 14 | hi, bn, ta, te, ar, de, es, fr, it, ja, ko, nl, pt, zh | 
| MGSM | 5 | en, es, fr, ja, zh | 
| Model | Micro Dense | H Micro Dense | H Tiny MoE | H Small MoE | 
|---|---|---|---|---|
| Embedding size | 2560 | 2048 | 1536 | 4096 | 
| Number of layers | 40 attention | 4 attention / 36 Mamba2 | 4 attention / 36 Mamba2 | 4 attention / 36 Mamba2 | 
| Attention head size | 64 | 64 | 128 | 128 | 
| Number of attention heads | 40 | 32 | 12 | 32 | 
| Number of KV heads | 8 | 8 | 4 | 8 | 
| Mamba2 state size | - | 128 | 128 | 128 | 
| Number of Mamba2 heads | - | 64 | 48 | 128 | 
| MLP / Shared expert hidden size | 8192 | 8192 | 1024 | 1536 | 
| Num. Experts | - | - | 64 | 72 | 
| Num. active Experts | - | - | 6 | 10 | 
| Expert hidden size | - | - | 512 | 768 | 
| MLP activation | SwiGLU | SwiGLU | SwiGLU | SwiGLU | 
| Sequence length | 128K | 128K | 128K | 128K | 
| Position embedding | RoPE | NoPE | NoPE | NoPE | 
| # Parameters | 3B | 3B | 7B | 32B | 
| # Active parameters | 3B | 3B | 1B | 9B |