🇩🇪 nanochat German: v1
This repository hosts the first German nanochat model. It was fine-tuned (mid-training phase) on various German SFT datasets.
💬 A demo space of the model can be found here.
Datasets
The chat model was fine-tuned on the following datasets:
- German Alpaca
- German Dolly
- German Evol Instruct
- German Guanako
- German Openhermes
- German ShareGPT
- German Spelling Tasks
More information can be found in the corresponding German nanochat repository.
Fine-Tuning Stats
- run: nanochat-german
- device_type:
- dtype: bfloat16
- num_iterations: -1
- max_seq_len: 2048
- device_batch_size: 32
- unembedding_lr: 0.0040
- embedding_lr: 0.2000
- matrix_lr: 0.0200
- init_lr_frac: 1.0000
- weight_decay: 0.0000
- eval_every: 150
- eval_tokens: 10,485,760
- total_batch_size: 524,288
- dry_run: 0
- Number of iterations: 346
- DDP world size: 8
- Minimum validation bpb: 0.6001
Evaluation Results
We use lm_eval to measure and compare the model's performance against other language models in the same parameter range (note: this list is not exhaustive):
| Model | arc_de | hellaswag_de | m_mmlu_de | truthfulqa_de_mc1 | truthfulqa_de_mc2 | ||
|---|---|---|---|---|---|---|---|
| acc | acc_norm | acc | acc_norm | acc | acc | acc | |
| nanochat German v1 | 0.2241 | 0.2626 | 0.3203 | 0.3581 | 0.2285 | 0.2500 | 0.4184 |
| LLäMmlein-120M | 0.1942 | 0.2301 | 0.2945 | 0.3178 | 0.2285 | 0.2310 | 0.4055 |
| LLäMmlein-1B | 0.2515 | 0.2960 | 0.3703 | 0.4490 | 0.2317 | 0.2322 | 0.3617 |
Command that was used to retrieve evaluation results - using our model:
lm_eval --model hf \
--model_args pretrained="stefan-it/nanochat-german-v1" \
--tasks "arc_de,hellaswag_de,m_mmlu_de,truthfulqa_de_mc1,truthfulqa_de_mc2" \
--device cuda:0 \
--batch_size auto \
--trust_remote_code \
--log_samples \
--output_path ./nanochat-german-v1
Demo
To generate some text, please make sure that you are using this specific HF branch.
Then the following code can be used:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
model_id = "stefan-it/nanochat-german-v1"
revision = "main"
max_new_tokens = 64
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=False, revision=revision)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=False, dtype=torch.bfloat16, revision=revision).to(device)
model.eval()
conversation = [
{"role": "user", "content": "Was ist die Hauptstadt von Bayern?"},
]
inputs = tokenizer.apply_chat_template(
conversation,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt"
).to(device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
)
# Decode only the generated tokens (excluding the input prompt)
generated_tokens = outputs[0, inputs["input_ids"].shape[1]:]
print(tokenizer.decode(generated_tokens, skip_special_tokens=True))
License
The model is licences under a permissive Apache 2.0 license.
Acknowledgements
- Many thanks to Andrej Karpathy's original nanochat repo!
- Thanks to the LLäMmlein team for making the pretraining data publicly available.
- Thanks to Ben and Joshua for help and working on the nanochat HF integration.
- Downloads last month
- 58
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for stefan-it/nanochat-german-v1
Base model
stefan-it/nanochat-german-base