🇩🇪 nanochat German: v1

nanochat German logo

This repository hosts the first German nanochat model. It was fine-tuned (mid-training phase) on various German SFT datasets.

💬 A demo space of the model can be found here.

Datasets

The chat model was fine-tuned on the following datasets:

More information can be found in the corresponding German nanochat repository.

Fine-Tuning Stats

  • run: nanochat-german
  • device_type:
  • dtype: bfloat16
  • num_iterations: -1
  • max_seq_len: 2048
  • device_batch_size: 32
  • unembedding_lr: 0.0040
  • embedding_lr: 0.2000
  • matrix_lr: 0.0200
  • init_lr_frac: 1.0000
  • weight_decay: 0.0000
  • eval_every: 150
  • eval_tokens: 10,485,760
  • total_batch_size: 524,288
  • dry_run: 0
  • Number of iterations: 346
  • DDP world size: 8
  • Minimum validation bpb: 0.6001

Evaluation Results

We use lm_eval to measure and compare the model's performance against other language models in the same parameter range (note: this list is not exhaustive):

Model arc_de hellaswag_de m_mmlu_de truthfulqa_de_mc1 truthfulqa_de_mc2
acc acc_norm acc acc_norm acc acc acc
nanochat German v1 0.2241 0.2626 0.3203 0.3581 0.2285 0.2500 0.4184
LLäMmlein-120M 0.1942 0.2301 0.2945 0.3178 0.2285 0.2310 0.4055
LLäMmlein-1B 0.2515 0.2960 0.3703 0.4490 0.2317 0.2322 0.3617

Command that was used to retrieve evaluation results - using our model:

lm_eval --model hf \
--model_args pretrained="stefan-it/nanochat-german-v1" \
--tasks "arc_de,hellaswag_de,m_mmlu_de,truthfulqa_de_mc1,truthfulqa_de_mc2" \
--device cuda:0 \
--batch_size auto \
--trust_remote_code \
--log_samples \
--output_path ./nanochat-german-v1

Demo

To generate some text, please make sure that you are using this specific HF branch.

Then the following code can be used:

import torch

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline


model_id = "stefan-it/nanochat-german-v1"
revision = "main"
max_new_tokens = 64
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=False, revision=revision)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=False, dtype=torch.bfloat16, revision=revision).to(device)
model.eval()

conversation = [
    {"role": "user", "content": "Was ist die Hauptstadt von Bayern?"},
]

inputs = tokenizer.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt"
).to(device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
    )

# Decode only the generated tokens (excluding the input prompt)
generated_tokens = outputs[0, inputs["input_ids"].shape[1]:]
print(tokenizer.decode(generated_tokens, skip_special_tokens=True))

License

The model is licences under a permissive Apache 2.0 license.

Acknowledgements

Downloads last month
58
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for stefan-it/nanochat-german-v1

Finetuned
(1)
this model

Datasets used to train stefan-it/nanochat-german-v1

Space using stefan-it/nanochat-german-v1 1