Stefan Schweter's picture

In a Training Loop 🔄

Stefan Schweter PRO

stefan-it

·

https://schweter.bayern

AI & ML interests

Flair Library 💕, NER & PoS Tagging, LM Pretraining (mostly encoder-only & encoder-decoder), Historical Language Models, German Language Models, Bavarian NLP 🥨

Recent Activity

liked a dataset 1 day ago

bltlab/open-ner-standardized

commented on a paper 6 days ago

Bolmo: Byteifying the Next Generation of Language Models

upvoted a paper 6 days ago

Bolmo: Byteifying the Next Generation of Language Models

View all activity

Organizations

upvoted a paper 6 days ago

Bolmo: Byteifying the Next Generation of Language Models

Paper • 2512.15586 • Published 7 days ago • 11

upvoted a paper 7 days ago

FiNERweb: Datasets and Artifacts for Scalable Multilingual Named Entity Recognition

Paper • 2512.13884 • Published 9 days ago • 14

upvoted an article 19 days ago

Article

We Got Claude to Fine-Tune an Open Source LLM

21 days ago

•

530

upvoted a paper 20 days ago

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Paper • 2511.18538 • Published about 1 month ago • 273

upvoted an article 22 days ago

Article

Transformers v5: Simple model definitions powering the AI ecosystem

+2

24 days ago

•

252

upvoted a changelog 26 days ago

Changelog

Add a Status to your Hugging Face profile

26 days ago

• 85

upvoted a paper 27 days ago

Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining

Paper • 2511.21613 • Published 28 days ago • 2

upvoted a paper about 1 month ago

DoPE: Denoising Rotary Position Embedding

Paper • 2511.09146 • Published Nov 12 • 93

upvoted an article about 1 month ago

Article

Building for an Open Future - our new partnership with Google Cloud

Nov 13

•

47

upvoted a paper about 1 month ago

Sample-Efficient Language Modeling with Linear Attention and Lightweight Enhancements

Paper • 2511.05560 • Published Nov 4 • 1

upvoted a collection about 1 month ago

Pre-training Dataset Samples

A collection of pre-training datasets samples of sizes 10M, 100M and 1B tokens. Ideal for use in quick experimentation and ablations. • 19 items • Updated Nov 11 • 16

upvoted a paper about 1 month ago

Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published Nov 5 • 125

upvoted an article about 1 month ago

Article

SYNTH: the new data frontier

Nov 10

•

6

upvoted 2 articles about 2 months ago

Article

The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix

Nov 3

•

51

Article

Streaming datasets: 100x More Efficient

+3

Oct 27

•

75

upvoted a collection about 2 months ago

HPLT3

everything for high quality filtering of HPLT3 • 1 item • Updated Nov 3 • 1

upvoted 2 papers about 2 months ago

BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data

Paper • 2510.10159 • Published Oct 11 • 3

Gaperon: A Peppered English-French Generative Language Model Suite

Paper • 2510.25771 • Published Oct 29 • 15

upvoted a collection about 2 months ago

gpt-oss-safeguard

gpt-oss-safeguard-120b and gpt-oss-safeguard-20b are safety reasoning models built-upon gpt-oss • 2 items • Updated Oct 29 • 58

upvoted a paper about 2 months ago

A Survey on LLM Mid-training

Paper • 2510.23081 • Published Oct 27 • 1