---
title: "Maintain the unmaintainable:\n1M python loc, 400+ models"
subtitle: "A peek into software engineering for the transformers library"
description: "A peek into software engineering for the transformers library"
authors:
- name: "Pablo Montalvo"
url: "https://huggingface.co/Molbap"
affiliations: [1]
- name: "Lysandre Debut"
url: "https://huggingface.co/Lysandre"
affiliations: [1]
- name: "Pedro Cuenca"
url: "https://huggingface.co/pcuenq"
affiliations: [1]
- name: "Yoni Gozlan"
url: "https://huggingface.co/yonigozlan"
affiliations: [1]
affiliations:
- name: "Hugging Face"
url: "https://huggingface.co"
published: "October 6, 2025"
tags: [transformers, engineering, design-philosophy]
tableOfContentsAutoCollapse: true
acknowledgements: "Special thanks to all the reviewers on this! Vaibhav Srivastav for his thoroughness, Cyril Vallez for his eagle eye, Yoni Gozlan (also for his excellent work on fast image processors), Arthur Zucker for his guidance, and of course the wonderful Thibaud Frere for designing this template and helping me out with it!
Most importantly: thanks to the entire Open-Source community, sincerely."
---
import HtmlEmbed from "../components/HtmlEmbed.astro";
import Stack from "../components/Stack.astro";
import FullWidth from "../components/FullWidth.astro";
import Wide from "../components/Wide.astro";
import Note from "../components/Note.astro";
import Image from "../components/Image.astro";
import Glossary from "../components/Glossary.astro";
import Tenet from "../components/Tenet.astro";
import Reference from "../components/Reference.astro";
import llamaGlmAttn from "./assets/image/llama_glm_attn.png";
import llamaCenter from "./assets/image/llama_center.png";
import clusterWave2vec2 from "./assets/image/cluster_wave2vec2.png";
import detrIsland from "./assets/image/detr_island.png";
import bigPictureZoomout from "./assets/image/big_picture_zoomout.png";
import timelineLlava from "./assets/image/timeline_llava.png";
import classicEncoders from "./assets/image/classic_encoders.png";
import stillGraphBloat from "./assets/image/still_graph_bloat.png";
import fastImageProcessors from "./assets/image/fast_image_processors.png";
import modelDebugger from "./assets/image/model_debugger.png";
## Preface
One million lines of `Python` code. Through them, the [`transformers`](https://github.com/huggingface/transformers) library supports more than 400 model architectures, from state-of-the-art LLMs and VLMs to specialized models for audio, video, and tables.
Built on `PyTorch`, it's a foundational tool for modern LLM usage, research, education, and tens of thousands of other open-source projects. Each AI model is added by the community, harmonized into a consistent interface, and tested daily on a CI to ensure reproducibility.
This scale presents a monumental engineering challenge.
How do you keep such a ship afloat, made of so many moving, unrelated parts, contributed to by a buzzing hivemind? Especially as the pace of ML research accelerates?
We receive constant feedback on everything from function signatures with hundreds of arguments to duplicated code and optimization concerns, and we listen to all of it, or try to. The library's usage keeps on growing, and we are a small team of maintainers and contributors, backed by hundreds of open-source community members.
We continue to support all new models and expect to do so for the foreseeable future.
This post dissects the design philosophy that makes this possible. It's the result of an evolution from our older principles, detailed on our previous [philosophy](https://huggingface.co/docs/transformers/en/philosophy) page, as well as its accompanying [blog post from 2022](https://huggingface.co/blog/transformers-design-philosophy). More recently (and we strongly recommend the read) we publish a blog post about [recent upgrades to transformers](https://huggingface.co/blog/faster-transformers), focusing on what makes the library faster today. All of these developments are only made possible thanks to these principles.
We formalize and articulate the "tenets" that have been guiding our development, demonstrate how they are implemented in code, and show the measurable impact they have on the library's sustainability and growth.
For any OSS maintainer, power user, or contributor, this is the map to understanding, using, and building upon `transformers`, but not only: any project of comparable size will require you to make deep choices, not only on design and choice of abstraction, but on the very mindset of the software you are building. These tenets may or may not be applicable to your project, but they provide a glimpse on how we work that could be helpful or inspirational.
Conventions used throughout this post:
We aim be the [source of truth for all model definitions](https://huggingface.co/blog/transformers-model-definition). This is not a tenet, but something that guides our decisions. Model implementations should be reliable, reproducible, and faithful to the original performances.
This overarching guideline ensures quality and reproducibility across all models in the library.All inference and training core logic has to be visible, top‑to‑bottom, to maximize each model's hackability.
Every model should be understandable and hackable by reading a single file from top to bottom.Optimize for reading, diff-ing, and tweaking, our users are power users. Variables can be explicit, full words, even several words, readability is primordial.
Code quality matters as much as functionality - optimize for human readers, not just computers.If it's model behavior, keep it in the file; abstractions are only for generic infra.
Model-specific logic belongs in the model file, not hidden behind abstractions.Copy when it helps users; keep successors in sync without centralizing behavior.
Evolution:
With the introduction and global adoption of modular transformers, we do not repeat any logic in the modular files, but end user files remain faithful to the original tenet.
Strategic duplication can improve readability and maintainability when done thoughtfully.Config, model, pre-processing; from_pretrained, save_pretrained, push_to_hub. We want the least amount of codepaths. Reading should be obvious, configurations should be obvious.
Keep the public interface simple and predictable, users should know what to expect.Evolve by additive standardization, never break public APIs.
Any artifact that was once on the hub and worked with transformers should be usable indefinitely with the same interface. Further, public methods should not change to avoid breaking dependencies.
Once something is public, it stays public, evolution through addition, not breaking changes.Same argument names, same outputs, hidden states and attentions exposed, enforced by tests. This is a goal as well as a tenet.
All models should feel familiar - consistent interfaces reduce cognitive load.modular_*.py declares reuse; the expanded modeling file stays visible and eager_attention_forward; faster backends are opt-in via config. We inform via types/annotations rather than enforce rigid kwargs, preserving integrations.
Next: parallel partitioning is declared as a plan, not through model surgery.
tp_plan), not through edits to Linears. Glob patterns target repeated blocks; modeling semantics stay intact.
Next: per-layer attention/caching schedules declared in config, not hardcoded.
llava_video → llava) for refactors that preserve behavior.
Next: concrete VLM choices that avoid leaky abstractions.
PreTrainedModel.
Next: pipeline-level wins that came from PyTorch-first choices (fast processors).