🇩🇪 nanochat German: Base Model

This repository hosts the first base German nanochat model.

It was pretrained with a modified version of the awesome nanochat implementation from Andrej Karpathy. The model was trained on 8xA100 from Lambda.

Dataset

The German nanochat model used a subset of the LLäMmlein pretraining dataset, which itself is a strict subset of the German portion of the RedPajama V2 dataset.

More information can be found in the corresponding dataset repository.

Stats

run: nanochat-german
device_type:
depth: 20
max_seq_len: 2048
num_iterations: -1
target_flops: -1.0000
target_param_data_ratio: 20
device_batch_size: 32
total_batch_size: 524,288
embedding_lr: 0.2000
unembedding_lr: 0.0040
weight_decay: 0.0000
matrix_lr: 0.0200
grad_clip: 1.0000
eval_every: 250
eval_tokens: 10,485,760
core_metric_every: 2000
core_metric_max_per_task: 500
sample_every: 2000
model_tag:
Number of parameters: 560,988,160
Number of FLOPs per token: 3.491758e+09
Calculated number of iterations: 21,400
Number of training tokens: 11,219,763,200
Tokens : Params ratio: 20.0000
DDP world size: 8
warmup_ratio: 0.0000
warmdown_ratio: 0.2000
final_lr_frac: 0.0000
Minimum validation bpb: 0.7735
Final validation bpb: 0.7735
CORE metric estimate: 0.0436
MFU %: 20.89%
Total training flops: 3.917670e+19
Total training time: 396.61m
Peak memory usage: 75374.27MiB

Loss

train bpb: 0.7678
val bpb: 0.7736

Here are some examples from the eval prompts:

- sample 0: <|bos|>Die Hauptstadt von Frankreich ist Paris. Die Stadt ist die Hauptstadt von Frankreich und hat 2,5 Millionen
- sample 1: <|bos|>Das chemische Symbol von Gold ist Ag. Es ist ein silberweißes Metall, das in der Natur in der
- sample 2: <|bos|>Wenn gestern Freitag war, dann ist morgen Freitag. Und wenn heute Freitag ist, dann ist morgen Freitag. Und wenn morgen
- sample 3: <|bos|>Das Gegenteil von heiß ist kalt
- sample 4: <|bos|>Die Planeten des Sonnensystems sind: Sonne, Mond, Merkur, Venus, Mars, Jupiter, Saturn, Uranus
- sample 5: <|bos|>Meine Lieblingsfarbe ist Blau. Ich mag es, wenn es ein bisschen wärmer ist als die anderen Farben
- sample 6: <|bos|>Wenn 5*x + 3 = 13, dann ist x = 13, dann ist 5*x + 3 = 13

Evaluation

Based on the evaluation dataset, here are the evaluation results:

Model: base_model (step 21400)
CORE metric: 0.0653
hellaswag_zeroshot: 0.1790
hellaswag: 0.1725
copa: 0.1600
boolq: -0.0293
mmlu_zeroshot: -0.0019
mmlu_fewshot: 0.0096
enterprise_pii_classification: -0.0330

Demo

To generate some text, please make sure that you are using this specific HF branch.

Then the following code can be used:

import torch

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline


model_id = "stefan-it/nanochat-german-base"
revision = "main"
max_new_tokens = 64
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=False, revision=revision)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=False, dtype=torch.bfloat16, revision=revision).to(device)
model.eval()


prompt = "Die Altstadt von München "
generator = pipeline('text-generation', model=model, tokenizer=tokenizer, device=device, max_new_tokens=max_new_tokens)
outputs = generator(prompt)
print(outputs)

License

The model is licences under a permissive Apache 2.0 license.

Acknowledgements

Many thanks to Andrej Karpathy's original nanochat repo!
Thanks to the LLäMmlein team for making the pretraining data publicly available.
Thanks to Ben and Joshua for help and working on the nanochat HF integration.

Downloads last month: 17

Safetensors

Model size

0.6B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for stefan-it/nanochat-german-base

Finetunes

1 model

stefan-it
/

nanochat-german-base