🇩🇪 nanochat German: Base Model
This repository hosts the first base German nanochat model.
It was pretrained with a modified version of the awesome nanochat implementation from Andrej Karpathy. The model was trained on 8xA100 from Lambda.
Dataset
The German nanochat model used a subset of the LLäMmlein pretraining dataset, which itself is a strict subset of the German portion of the RedPajama V2 dataset.
More information can be found in the corresponding dataset repository.
Stats
- run: nanochat-german
- device_type:
- depth: 20
- max_seq_len: 2048
- num_iterations: -1
- target_flops: -1.0000
- target_param_data_ratio: 20
- device_batch_size: 32
- total_batch_size: 524,288
- embedding_lr: 0.2000
- unembedding_lr: 0.0040
- weight_decay: 0.0000
- matrix_lr: 0.0200
- grad_clip: 1.0000
- eval_every: 250
- eval_tokens: 10,485,760
- core_metric_every: 2000
- core_metric_max_per_task: 500
- sample_every: 2000
- model_tag:
- Number of parameters: 560,988,160
- Number of FLOPs per token: 3.491758e+09
- Calculated number of iterations: 21,400
- Number of training tokens: 11,219,763,200
- Tokens : Params ratio: 20.0000
- DDP world size: 8
- warmup_ratio: 0.0000
- warmdown_ratio: 0.2000
- final_lr_frac: 0.0000
- Minimum validation bpb: 0.7735
- Final validation bpb: 0.7735
- CORE metric estimate: 0.0436
- MFU %: 20.89%
- Total training flops: 3.917670e+19
- Total training time: 396.61m
- Peak memory usage: 75374.27MiB
Loss
- train bpb: 0.7678
- val bpb: 0.7736
Here are some examples from the eval prompts:
- sample 0: <|bos|>Die Hauptstadt von Frankreich ist Paris. Die Stadt ist die Hauptstadt von Frankreich und hat 2,5 Millionen
- sample 1: <|bos|>Das chemische Symbol von Gold ist Ag. Es ist ein silberweißes Metall, das in der Natur in der
- sample 2: <|bos|>Wenn gestern Freitag war, dann ist morgen Freitag. Und wenn heute Freitag ist, dann ist morgen Freitag. Und wenn morgen
- sample 3: <|bos|>Das Gegenteil von heiß ist kalt
- sample 4: <|bos|>Die Planeten des Sonnensystems sind: Sonne, Mond, Merkur, Venus, Mars, Jupiter, Saturn, Uranus
- sample 5: <|bos|>Meine Lieblingsfarbe ist Blau. Ich mag es, wenn es ein bisschen wärmer ist als die anderen Farben
- sample 6: <|bos|>Wenn 5*x + 3 = 13, dann ist x = 13, dann ist 5*x + 3 = 13
Evaluation
Based on the evaluation dataset, here are the evaluation results:
- Model: base_model (step 21400)
- CORE metric: 0.0653
- hellaswag_zeroshot: 0.1790
- hellaswag: 0.1725
- copa: 0.1600
- boolq: -0.0293
- mmlu_zeroshot: -0.0019
- mmlu_fewshot: 0.0096
- enterprise_pii_classification: -0.0330
Demo
To generate some text, please make sure that you are using this specific HF branch.
Then the following code can be used:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
model_id = "stefan-it/nanochat-german-base"
revision = "main"
max_new_tokens = 64
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=False, revision=revision)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=False, dtype=torch.bfloat16, revision=revision).to(device)
model.eval()
prompt = "Die Altstadt von München "
generator = pipeline('text-generation', model=model, tokenizer=tokenizer, device=device, max_new_tokens=max_new_tokens)
outputs = generator(prompt)
print(outputs)
License
The model is licences under a permissive Apache 2.0 license.
Acknowledgements
- Many thanks to Andrej Karpathy's original nanochat repo!
- Thanks to the LLäMmlein team for making the pretraining data publicly available.
- Thanks to Ben and Joshua for help and working on the nanochat HF integration.
- Downloads last month
- 17
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support