--- library_name: transformers license: apache-2.0 datasets: - HuggingFaceFW/fineweb-edu language: - en --- # Model Details This model is a 1B llama3 model pretrained from scratch with torchtitan on fineweb-edu with C_AdamW optimizer. 20x chinchilla rule for 20B tokens seen. # How to use ``` import torch from transformers import pipeline pipe = pipeline( "text-generation", model="kz919/llama3_1b_cautious_chinchilla_8132025", ) print(pipe("The key to life is")) ``` # Downstream Eval ## ARC, Hellaswag, Lambda_OpenAI, OpenbookQA, PIQA ``` lm_eval --model hf --model_args pretrained=kz919/llama3_1b_cautious_chinchilla_8142025,dtype="bfloat16",add_bos_token=True --tasks lambada_openai,hellaswag,piqa,arc_easy,arc_challenge,openbookqa --device cuda:7 --batch_size 8 ``` | Tasks |Version|Filter|n-shot| Metric | | Value | |Stderr| |--------------|------:|------|-----:|----------|---|------:|---|-----:| |arc_challenge | 1|none | 0|acc |↑ | 0.2730|± |0.0130| | | |none | 0|acc_norm |↑ | 0.2765|± |0.0131| |arc_easy | 1|none | 0|acc |↑ | 0.5960|± |0.0101| | | |none | 0|acc_norm |↑ | 0.5290|± |0.0102| |hellaswag | 1|none | 0|acc |↑ | 0.3442|± |0.0047| | | |none | 0|acc_norm |↑ | 0.4122|± |0.0049| |lambada_openai| 1|none | 0|acc |↑ | 0.3264|± |0.0065| | | |none | 0|perplexity|↓ |39.7510|± |1.6063| |openbookqa | 1|none | 0|acc |↑ | 0.2200|± |0.0185| | | |none | 0|acc_norm |↑ | 0.3300|± |0.0210| |piqa | 1|none | 0|acc |↑ | 0.6872|± |0.0108| | | |none | 0|acc_norm |↑ | 0.6850|± |0.0108| ## MMLU | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr| |------------------|------:|------|------|------|---|-----:|---|-----:| |mmlu | 2|none | |acc |↑ |0.2536|± |0.0037| | - humanities | 2|none | |acc |↑ |0.2667|± |0.0064| | - other | 2|none | |acc |↑ |0.2475|± |0.0077| | - social sciences| 2|none | |acc |↑ |0.2337|± |0.0076| | - stem | 2|none | |acc |↑ |0.2594|± |0.0078|