metadata
library_name: transformers
license: apache-2.0
datasets:
- HuggingFaceFW/fineweb-edu
language:
- en
Model Details
This model is a 1B llama3 model pretrained from scratch with torchtitan on fineweb-edu with C_AdamW optimizer. 20x chinchilla rule for 20B tokens seen.
How to use
import torch
from transformers import pipeline
pipe = pipeline(
"text-generation",
model="kz919/llama3_1b_cautious_chinchilla_8132025",
)
print(pipe("The key to life is"))
Downstream Eval
ARC, Hellaswag, Lambda_OpenAI, OpenbookQA, PIQA
lm_eval --model hf --model_args pretrained=kz919/llama3_1b_cautious_chinchilla_8142025,dtype="bfloat16",add_bos_token=True --tasks lambada_openai,hellaswag,piqa,arc_easy,arc_challenge,openbookqa --device cuda:7 --batch_size 8
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| arc_challenge | 1 | none | 0 | acc | ↑ | 0.2730 | ± | 0.0130 |
| none | 0 | acc_norm | ↑ | 0.2765 | ± | 0.0131 | ||
| arc_easy | 1 | none | 0 | acc | ↑ | 0.5960 | ± | 0.0101 |
| none | 0 | acc_norm | ↑ | 0.5290 | ± | 0.0102 | ||
| hellaswag | 1 | none | 0 | acc | ↑ | 0.3442 | ± | 0.0047 |
| none | 0 | acc_norm | ↑ | 0.4122 | ± | 0.0049 | ||
| lambada_openai | 1 | none | 0 | acc | ↑ | 0.3264 | ± | 0.0065 |
| none | 0 | perplexity | ↓ | 39.7510 | ± | 1.6063 | ||
| openbookqa | 1 | none | 0 | acc | ↑ | 0.2200 | ± | 0.0185 |
| none | 0 | acc_norm | ↑ | 0.3300 | ± | 0.0210 | ||
| piqa | 1 | none | 0 | acc | ↑ | 0.6872 | ± | 0.0108 |
| none | 0 | acc_norm | ↑ | 0.6850 | ± | 0.0108 |
MMLU
| Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| mmlu | 2 | none | acc | ↑ | 0.2536 | ± | 0.0037 | |
| - humanities | 2 | none | acc | ↑ | 0.2667 | ± | 0.0064 | |
| - other | 2 | none | acc | ↑ | 0.2475 | ± | 0.0077 | |
| - social sciences | 2 | none | acc | ↑ | 0.2337 | ± | 0.0076 | |
| - stem | 2 | none | acc | ↑ | 0.2594 | ± | 0.0078 |