File size: 2,372 Bytes
630d5a9 8ce9ea8 630d5a9 8ce9ea8 36dcd0f 8ce9ea8 dad6412 c0e1d05 8ce9ea8 0bf0156 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
---
library_name: transformers
license: apache-2.0
datasets:
- HuggingFaceFW/fineweb-edu
language:
- en
---
# Model Details
This model is a 1B llama3 model pretrained from scratch with torchtitan on fineweb-edu with C_AdamW optimizer. 20x chinchilla rule for 20B tokens seen.
# How to use
```
import torch
from transformers import pipeline
pipe = pipeline(
"text-generation",
model="kz919/llama3_1b_cautious_chinchilla_8132025",
)
print(pipe("The key to life is"))
```
# Downstream Eval
## ARC, Hellaswag, Lambda_OpenAI, OpenbookQA, PIQA
```
lm_eval --model hf --model_args pretrained=kz919/llama3_1b_cautious_chinchilla_8142025,dtype="bfloat16",add_bos_token=True --tasks lambada_openai,hellaswag,piqa,arc_easy,arc_challenge,openbookqa --device cuda:7 --batch_size 8
```
| Tasks |Version|Filter|n-shot| Metric | | Value | |Stderr|
|--------------|------:|------|-----:|----------|---|------:|---|-----:|
|arc_challenge | 1|none | 0|acc |↑ | 0.2730|± |0.0130|
| | |none | 0|acc_norm |↑ | 0.2765|± |0.0131|
|arc_easy | 1|none | 0|acc |↑ | 0.5960|± |0.0101|
| | |none | 0|acc_norm |↑ | 0.5290|± |0.0102|
|hellaswag | 1|none | 0|acc |↑ | 0.3442|± |0.0047|
| | |none | 0|acc_norm |↑ | 0.4122|± |0.0049|
|lambada_openai| 1|none | 0|acc |↑ | 0.3264|± |0.0065|
| | |none | 0|perplexity|↓ |39.7510|± |1.6063|
|openbookqa | 1|none | 0|acc |↑ | 0.2200|± |0.0185|
| | |none | 0|acc_norm |↑ | 0.3300|± |0.0210|
|piqa | 1|none | 0|acc |↑ | 0.6872|± |0.0108|
| | |none | 0|acc_norm |↑ | 0.6850|± |0.0108|
## MMLU
| Groups |Version|Filter|n-shot|Metric| |Value | |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu | 2|none | |acc |↑ |0.2536|± |0.0037|
| - humanities | 2|none | |acc |↑ |0.2667|± |0.0064|
| - other | 2|none | |acc |↑ |0.2475|± |0.0077|
| - social sciences| 2|none | |acc |↑ |0.2337|± |0.0076|
| - stem | 2|none | |acc |↑ |0.2594|± |0.0078| |