--- base_model: - ibm-granite/granite-4.0-micro license: apache-2.0 library_name: transformers tags: - language - unsloth - granite-4.0 ---
See our collection for all versions of Granite-4.0 including GGUF, 4-bit & 16-bit formats.
Learn to run Granite 4.0 correctly - Read our Guide.
See Unsloth Dynamic 2.0 GGUFs for our quantization benchmarks.
| Benchmarks | Metric | Micro Dense | H Micro Dense | H Tiny MoE | H Small MoE |
|---|---|---|---|---|---|
| General Tasks | |||||
| MMLU | 5-shot | 65.98 | 67.43 | 68.65 | 78.44 |
| MMLU-Pro | 5-shot, CoT | 44.5 | 43.48 | 44.94 | 55.47 |
| BBH | 3-shot, CoT | 72.48 | 69.36 | 66.34 | 81.62 |
| AGI EVAL | 0-shot, CoT | 64.29 | 59 | 62.15 | 70.63 |
| GPQA | 0-shot, CoT | 30.14 | 32.15 | 32.59 | 40.63 |
| Alignment Tasks | |||||
| AlpacaEval 2.0 | 29.49 | 31.49 | 30.61 | 42.48 | |
| IFEval | Instruct, Strict | 85.5 | 86.94 | 84.78 | 89.87 |
| IFEval | Prompt, Strict | 79.12 | 81.71 | 78.1 | 85.22 |
| IFEval | Average | 82.31 | 84.32 | 81.44 | 87.55 |
| ArenaHard | 25.84 | 36.15 | 35.75 | 46.48 | |
| Math Tasks | |||||
| GSM8K | 8-shot | 85.45 | 81.35 | 84.69 | 87.27 |
| GSM8K Symbolic | 8-shot | 79.82 | 77.5 | 81.1 | 87.38 |
| Minerva Math | 0-shot, CoT | 62.06 | 66.44 | 69.64 | 74 |
| DeepMind Math | 0-shot, CoT | 44.56 | 43.83 | 49.92 | 59.33 |
| Code Tasks | |||||
| HumanEval | pass@1 | 80 | 81 | 83 | 88 |
| HumanEval+ | pass@1 | 72 | 75 | 76 | 83 |
| MBPP | pass@1 | 72 | 73 | 80 | 84 |
| MBPP+ | pass@1 | 64 | 64 | 69 | 71 |
| CRUXEval-O | pass@1 | 41.5 | 41.25 | 39.63 | 50.25 |
| BigCodeBench | pass@1 | 39.21 | 37.9 | 41.06 | 46.23 |
| Tool Calling Tasks | |||||
| BFCL v3 | 59.98 | 57.56 | 57.65 | 64.69 | |
| Multilingual Tasks | |||||
| MULTIPLE | pass@1 | 49.21 | 49.46 | 55.83 | 57.37 |
| MMMLU | 5-shot | 55.14 | 55.19 | 61.87 | 69.69 |
| INCLUDE | 5-shot | 51.62 | 50.51 | 53.12 | 63.97 |
| MGSM | 8-shot | 28.56 | 44.48 | 45.36 | 38.72 |
| Safety | |||||
| SALAD-Bench | 97.06 | 96.28 | 97.77 | 97.3 | |
| AttaQ | 86.05 | 84.44 | 86.61 | 86.64 | |
| Benchmarks | # Langs | Languages |
|---|---|---|
| MMMLU | 11 | ar, de, en, es, fr, ja, ko, pt, zh, bn, hi |
| INCLUDE | 14 | hi, bn, ta, te, ar, de, es, fr, it, ja, ko, nl, pt, zh |
| MGSM | 5 | en, es, fr, ja, zh |
| Model | Micro Dense | H Micro Dense | H Tiny MoE | H Small MoE |
|---|---|---|---|---|
| Embedding size | 2560 | 2048 | 1536 | 4096 |
| Number of layers | 40 attention | 4 attention / 36 Mamba2 | 4 attention / 36 Mamba2 | 4 attention / 36 Mamba2 |
| Attention head size | 64 | 64 | 128 | 128 |
| Number of attention heads | 40 | 32 | 12 | 32 |
| Number of KV heads | 8 | 8 | 4 | 8 |
| Mamba2 state size | - | 128 | 128 | 128 |
| Number of Mamba2 heads | - | 64 | 48 | 128 |
| MLP / Shared expert hidden size | 8192 | 8192 | 1024 | 1536 |
| Num. Experts | - | - | 64 | 72 |
| Num. active Experts | - | - | 6 | 10 |
| Expert hidden size | - | - | 512 | 768 |
| MLP activation | SwiGLU | SwiGLU | SwiGLU | SwiGLU |
| Sequence length | 128K | 128K | 128K | 128K |
| Position embedding | RoPE | NoPE | NoPE | NoPE |
| # Parameters | 3B | 3B | 7B | 32B |
| # Active parameters | 3B | 3B | 1B | 9B |