---
library_name: transformers
license: apache-2.0
datasets:
- HuggingFaceFW/fineweb-edu
language:
- en
---

# Model Details
This model is a 1B llama3 model pretrained from scratch with torchtitan on fineweb-edu with C_AdamW optimizer. 20x chinchilla rule for 20B tokens seen.

# How to use
```
import torch
from transformers import pipeline


pipe = pipeline(
    "text-generation",
    model="kz919/llama3_1b_cautious_chinchilla_8132025",
)

print(pipe("The key to life is"))
```

# Downstream Eval
## ARC, Hellaswag, Lambda_OpenAI, OpenbookQA, PIQA
```
lm_eval --model hf --model_args pretrained=kz919/llama3_1b_cautious_chinchilla_8142025,dtype="bfloat16",add_bos_token=True --tasks lambada_openai,hellaswag,piqa,arc_easy,arc_challenge,openbookqa --device cuda:7 --batch_size 8
```

|    Tasks     |Version|Filter|n-shot|  Metric  |   | Value |   |Stderr|
|--------------|------:|------|-----:|----------|---|------:|---|-----:|
|arc_challenge |      1|none  |     0|acc       |↑  | 0.2730|±  |0.0130|
|              |       |none  |     0|acc_norm  |↑  | 0.2765|±  |0.0131|
|arc_easy      |      1|none  |     0|acc       |↑  | 0.5960|±  |0.0101|
|              |       |none  |     0|acc_norm  |↑  | 0.5290|±  |0.0102|
|hellaswag     |      1|none  |     0|acc       |↑  | 0.3442|±  |0.0047|
|              |       |none  |     0|acc_norm  |↑  | 0.4122|±  |0.0049|
|lambada_openai|      1|none  |     0|acc       |↑  | 0.3264|±  |0.0065|
|              |       |none  |     0|perplexity|↓  |39.7510|±  |1.6063|
|openbookqa    |      1|none  |     0|acc       |↑  | 0.2200|±  |0.0185|
|              |       |none  |     0|acc_norm  |↑  | 0.3300|±  |0.0210|
|piqa          |      1|none  |     0|acc       |↑  | 0.6872|±  |0.0108|
|              |       |none  |     0|acc_norm  |↑  | 0.6850|±  |0.0108|

## MMLU
|      Groups      |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu              |      2|none  |      |acc   |↑  |0.2536|±  |0.0037|
| - humanities     |      2|none  |      |acc   |↑  |0.2667|±  |0.0064|
| - other          |      2|none  |      |acc   |↑  |0.2475|±  |0.0077|
| - social sciences|      2|none  |      |acc   |↑  |0.2337|±  |0.0076|
| - stem           |      2|none  |      |acc   |↑  |0.2594|±  |0.0078|