Molbap HF Staff commited on
Commit
530e759
·
1 Parent(s): 165284f
app/dist/index.html CHANGED
The diff for this file is too large to render. See raw diff
 
app/dist/index.html.gz CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f073b768a6cf51ffeccf975a1448aa1ecf36a3fd42daf0626356f5bdb01597f4
3
- size 64292
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e6ba9616dba9dfa5cc9ef3815f320ccb912d7e70851a5a4e7888917e1eebce23
3
+ size 64010
app/src/content/article.mdx CHANGED
@@ -47,13 +47,15 @@ import modelDebugger from "./assets/image/model_debugger.png";
47
 
48
  ## Preface
49
 
50
- One million lines of `python` code. Through them, the `transformers` library supports more than 400 model architectures, from state-of-the-art LLMs and VLMs to specialized models for audio, video, and tables.
51
 
52
- Built on `PyTorch`, transformers is a foundational tool for modern LLM usage, research, education, and tens of thousands of other open-source projects. Each AI model is added by the community, harmonized into a consistent interface, and tested daily on a CI to ensure reproducibility.
53
 
54
  This scale presents a monumental engineering challenge.
55
 
56
- How do you keep such a ship afloat, made of so many moving, unrelated parts, contributed to by a buzzing hivemind? Especially as the pace of ML research accelerates? We receive constant feedback on everything from function signatures with hundreds of arguments to duplicated code and optimization concerns, and we listen to all of it, or try to. The library's usage keeps on growing, and we are a small team of maintainers and contributors, backed by hundreds of open-source community members.
 
 
57
  We continue to support all new models and expect to do so for the foreseeable future.
58
 
59
  This post dissects the design philosophy that makes this possible. It's the result of an evolution from our older principles, detailed on our previous [philosophy](https://huggingface.co/docs/transformers/en/philosophy) page, as well as its accompanying [blog post from 2022](https://huggingface.co/blog/transformers-design-philosophy). More recently (and we strongly recommend the read) we publish a blog post about [recent upgrades to transformers](https://huggingface.co/blog/faster-transformers), focusing on what makes the library faster today. All of these developments are only made possible thanks to these principles.
@@ -88,26 +90,27 @@ These principles were not decided in a vacuum. The library _evolved_ towards the
88
  <li class="tenet">
89
  <a id="source-of-truth"></a>
90
  <strong>Source of Truth</strong>
91
- <p>We aim to be a [source of truth for all model definitions](https://huggingface.co/blog/transformers-model-definition). This is more of a goal than a tenet, but it strongly guides our decisions. Model implementations should be reliable, reproducible, and faithful to the original implementations. If we are successful, they should become reference baselines for the ecosystem, so they'll be easily adopted by downstream libraries and projects. It's much easier for a project to always refer to the transformers implementation, than to learn a different research codebase every time a new architecture is released.</p>
92
- <em>This overarching guideline ensures quality and reproducibility across all models in the library, and aspires to make the community work easier.</em>
93
  </li>
94
 
95
  <li class="tenet">
96
  <a id="one-model-one-file"></a>
97
  <strong>One Model, One File</strong>
98
  <p>All inference and training core logic has to be visible, top‑to‑bottom, to maximize each model's hackability.</p>
99
- <em>Every model should be completely understandable and hackable by reading a single file from top to bottom.</em>
100
  </li>
 
101
  <li class="tenet">
102
  <a id="code-is-product"></a>
103
- <strong>Code is Product</strong>
104
- <p>Optimize for reading, diffing, and tweaking, our users are power users. Variables should be explicit, full words, even several words, readability is primordial.</p>
105
  <em>Code quality matters as much as functionality - optimize for human readers, not just computers.</em>
106
  </li>
107
  <li class="tenet">
108
  <a id="standardize-dont-abstract"></a>
109
  <strong>Standardize, Don't Abstract</strong>
110
- <p>If it's model behavior, keep it in the file; use abstractions only for generic infra.</p>
111
  <em>Model-specific logic belongs in the model file, not hidden behind abstractions.</em>
112
  </li>
113
  <li class="tenet">
@@ -121,14 +124,14 @@ These principles were not decided in a vacuum. The library _evolved_ towards the
121
  <li class="tenet">
122
  <a id="minimal-user-api"></a>
123
  <strong>Minimal User API</strong>
124
- <p>Config, model, preprocessing; from_pretrained, save_pretrained, push_to_hub. We want the least amount of codepaths. Reading should be obvious, configurations should be obvious.</p>
125
  <em>Keep the public interface simple and predictable, users should know what to expect.</em>
126
  </li>
127
  <li class="tenet">
128
  <a id="backwards-compatibility"></a>
129
  <strong>Backwards Compatibility</strong>
130
  <p>Evolve by additive standardization, never break public APIs.</p>
131
- <p>Any artifact that was once on the hub and loadable with transformers should be usable indefinitely with the same interface. Further, public methods should not change to avoid breaking dependencies. If we do deprecate something, it's with very long cycles beforehand.</p>
132
  <em>Once something is public, it stays public, evolution through addition, not breaking changes.</em>
133
  </li>
134
  <li class="tenet">
@@ -158,15 +161,15 @@ def rotate_half(x):
158
  ```
159
 
160
 
161
- We want all models to have self-contained modeling code. Every core functionality _must_ be in the modeling code, every non-core functionality _can_ be outside of it.
162
 
163
- This comes at a great cost. For years, we have used what we call the `#Copied from...` mechanism: we added comments of a specific format documenting that some code was copied from another model, saving time both for the reviewers and for the CI: we had tooling to ensure that the copied blocks remained in sync.
164
 
165
- But the LOC count kept creeping up. Each new model copied over hundreds of lines that we considered largely boilerplate, yet, we could not remove them.
166
 
167
- We needed to separate two principles that were so far intertwined, <Tenet term="do-repeat-yourself" display="repetition" position="top" /> and <Tenet term="one-model-one-file" display="hackability" position="top" />.
168
 
169
- What is the solution to this? Let's talk about modular transformers.
170
 
171
  <Note variant="info">
172
  <strong>TL;DR:</strong> Read the code in one place, <Tenet term="one-model-one-file" display="one model, one file." position="top" />. Keep semantics local (<a href="#standardize-dont-abstract">Standardize, Don't Abstract</a>). Allow strategic duplication for end users (<a href="#do-repeat-yourself">DRY*</a>). Keep the public surface minimal and stable (<a href="#minimal-user-api">Minimal API</a>, <a href="#backwards-compatibility">Backwards Compatibility</a>, <a href="#consistent-public-surface">Consistent Surface</a>).
@@ -486,7 +489,7 @@ Parallelization is specified in the configuration (<code>tp_plan</code>), not th
486
 
487
  ### <a id="layers-attentions-caches"></a> Layers, attentions and caches
488
 
489
- Following the same logic, the _nature_ of attention and per-layer caching should not be hardcoded. We should be able to specify in the configuration how each layer is implemented. Thus, we define a mapping like:
490
 
491
 
492
  ```python
@@ -594,11 +597,9 @@ Llama-lineage is a hub; several VLMs remain islands — engineering opportunity
594
 
595
  ### Many models, but not enough yet, are alike
596
 
597
- I look into Jaccard similarity, which we use to measure set differences, to find similarities across models. I know that code is more than a set of characters stringed together. We also try code-embedding models that rank candidates better in practice, but for this post we stick to the deterministic Jaccard index.
598
 
599
- It is interesting, for our comparison, to look at _when_ we deployed the modular logic and what was its rippling effect on the library. Looking at the timeline makes it obvious: adding modular allowed to connect more and more models to solid reference points.
600
-
601
- Yet, we still have a lot of gaps to fill.
602
 
603
  Zoom out below - it's full of models. You can click on a node to see its connections better, or use the text box to search for a model. You can use the [full viewer](https://huggingface.co/spaces/Molbap/transformers-modular-refactor) (tab "timeline", hit "build timeline") for better exploration.
604
 
@@ -722,11 +723,9 @@ This is an overall objective: there's no `transformers` without its community.
722
 
723
  Having a framework means forcing users into it. It restrains flexibility and creativity, which are the fertile soil for new ideas to grow.
724
 
725
- Among the most valuable contributions to `transformers` is of course the addition of new models. Very recently, [OpenAI added GPT-OSS](https://huggingface.co/blog/welcome-openai-gpt-oss), which prompted the addition of many new features to the library in order to support [their model](https://huggingface.co/openai/gpt-oss-120b).
726
-
727
- These additions are immediately available for other models to use.
728
 
729
- Another important advantage is the ability to fine-tune and pipeline these models into many other libraries and tools. Check here on the hub how many finetunes are registered for [gpt-oss 120b](https://huggingface.co/models?other=base_model:finetune:openai/gpt-oss-120b), despite its size!
730
 
731
 
732
  <Note variant="info">
 
47
 
48
  ## Preface
49
 
50
+ One million lines of `Python` code. Through them, the [`transformers`](https://github.com/huggingface/transformers) library supports more than 400 model architectures, from state-of-the-art LLMs and VLMs to specialized models for audio, video, and tables.
51
 
52
+ Built on `PyTorch`, it's a foundational tool for modern LLM usage, research, education, and tens of thousands of other open-source projects. Each AI model is added by the community, harmonized into a consistent interface, and tested daily on a CI to ensure reproducibility.
53
 
54
  This scale presents a monumental engineering challenge.
55
 
56
+ How do you keep such a ship afloat, made of so many moving, unrelated parts, contributed to by a buzzing hivemind? Especially as the pace of ML research accelerates?
57
+
58
+ We receive constant feedback on everything from function signatures with hundreds of arguments to duplicated code and optimization concerns, and we listen to all of it, or try to. The library's usage keeps on growing, and we are a small team of maintainers and contributors, backed by hundreds of open-source community members.
59
  We continue to support all new models and expect to do so for the foreseeable future.
60
 
61
  This post dissects the design philosophy that makes this possible. It's the result of an evolution from our older principles, detailed on our previous [philosophy](https://huggingface.co/docs/transformers/en/philosophy) page, as well as its accompanying [blog post from 2022](https://huggingface.co/blog/transformers-design-philosophy). More recently (and we strongly recommend the read) we publish a blog post about [recent upgrades to transformers](https://huggingface.co/blog/faster-transformers), focusing on what makes the library faster today. All of these developments are only made possible thanks to these principles.
 
90
  <li class="tenet">
91
  <a id="source-of-truth"></a>
92
  <strong>Source of Truth</strong>
93
+ <p>We aim be the [source of truth for all model definitions](https://huggingface.co/blog/transformers-model-definition). This is not a tenet, but something that guides our decisions. Model implementations should be reliable, reproducible, and faithful to the original performances.</p>
94
+ <em>This overarching guideline ensures quality and reproducibility across all models in the library.</em>
95
  </li>
96
 
97
  <li class="tenet">
98
  <a id="one-model-one-file"></a>
99
  <strong>One Model, One File</strong>
100
  <p>All inference and training core logic has to be visible, top‑to‑bottom, to maximize each model's hackability.</p>
101
+ <em>Every model should be understandable and hackable by reading a single file from top to bottom.</em>
102
  </li>
103
+
104
  <li class="tenet">
105
  <a id="code-is-product"></a>
106
+ <strong>Code is the Product</strong>
107
+ <p>Optimize for reading, diff-ing, and tweaking, our users are power users. Variables can be explicit, full words, even several words, readability is primordial.</p>
108
  <em>Code quality matters as much as functionality - optimize for human readers, not just computers.</em>
109
  </li>
110
  <li class="tenet">
111
  <a id="standardize-dont-abstract"></a>
112
  <strong>Standardize, Don't Abstract</strong>
113
+ <p>If it's model behavior, keep it in the file; abstractions are only for generic infra.</p>
114
  <em>Model-specific logic belongs in the model file, not hidden behind abstractions.</em>
115
  </li>
116
  <li class="tenet">
 
124
  <li class="tenet">
125
  <a id="minimal-user-api"></a>
126
  <strong>Minimal User API</strong>
127
+ <p>Config, model, pre-processing; from_pretrained, save_pretrained, push_to_hub. We want the least amount of codepaths. Reading should be obvious, configurations should be obvious.</p>
128
  <em>Keep the public interface simple and predictable, users should know what to expect.</em>
129
  </li>
130
  <li class="tenet">
131
  <a id="backwards-compatibility"></a>
132
  <strong>Backwards Compatibility</strong>
133
  <p>Evolve by additive standardization, never break public APIs.</p>
134
+ <p>Any artifact that was once on the hub and worked with transformers should be usable indefinitely with the same interface. Further, public methods should not change to avoid breaking dependencies.</p>
135
  <em>Once something is public, it stays public, evolution through addition, not breaking changes.</em>
136
  </li>
137
  <li class="tenet">
 
161
  ```
162
 
163
 
164
+ We want all models to have self-contained modeling code.
165
 
166
+ Each core functionality _must_ be in the modeling code, every non-core functionality _can_ be outside of it.
167
 
168
+ This comes as a great cost. Enter the `#Copied from...` mechanism: for a long time, these comments were indicating that some code was copied from another model, saving time both for the reviewers and for the CI. But the LOC count kept creeping up. Each new model copied over hundreds of lines that we considered largely boilerplate, yet, we could not remove them.
169
 
170
+ We need to separate both principles that were so far intertwined, <Tenet term="do-repeat-yourself" display="repetition" position="top" /> and <Tenet term="one-model-one-file" display="hackabilty" position="top" />.
171
 
172
+ What's the solution to this?
173
 
174
  <Note variant="info">
175
  <strong>TL;DR:</strong> Read the code in one place, <Tenet term="one-model-one-file" display="one model, one file." position="top" />. Keep semantics local (<a href="#standardize-dont-abstract">Standardize, Don't Abstract</a>). Allow strategic duplication for end users (<a href="#do-repeat-yourself">DRY*</a>). Keep the public surface minimal and stable (<a href="#minimal-user-api">Minimal API</a>, <a href="#backwards-compatibility">Backwards Compatibility</a>, <a href="#consistent-public-surface">Consistent Surface</a>).
 
489
 
490
  ### <a id="layers-attentions-caches"></a> Layers, attentions and caches
491
 
492
+ Following the same logic, the _nature_ of attention and caching per layer of a model should not be hardcoded. We should be able to specify in a configuration-based fashion how each layer is implemented. Thus we define a mapping that can be then
493
 
494
 
495
  ```python
 
597
 
598
  ### Many models, but not enough yet, are alike
599
 
600
+ Next, I looked into Jaccard similarity, which we use to measure set differences. I know that code is more than a set of characters stringed together. I also used code embedding models to check out code similarities, and it yielded better results, for the needs of this blog post I will stick to Jaccard index.
601
 
602
+ It is interesting, for that, to look at _when_ we deployed this modular logic and what was its rippling effect on the library. You can check the [larger space](https://huggingface.co/spaces/Molbap/transformers-modular-refactor) to play around, but the gist is: adding modular allowed to connect more and more models to solid reference points. We have a lot of gaps to fill in still.
 
 
603
 
604
  Zoom out below - it's full of models. You can click on a node to see its connections better, or use the text box to search for a model. You can use the [full viewer](https://huggingface.co/spaces/Molbap/transformers-modular-refactor) (tab "timeline", hit "build timeline") for better exploration.
605
 
 
723
 
724
  Having a framework means forcing users into it. It restrains flexibility and creativity, which are the fertile soil for new ideas to grow.
725
 
726
+ Among the most valuable contributions to `transformers` is of course the addition of new models. Recently, [OpenAI added GPT-OSS](https://huggingface.co/blog/welcome-openai-gpt-oss), which prompted the addition of many new features to the library in order to support [their model](https://huggingface.co/openai/gpt-oss-120b).
 
 
727
 
728
+ A second one is the ability to fine-tune and pipeline these models into many other softwares. Check here on the hub how many finetunes are registered for [gpt-oss 120b](https://huggingface.co/models?other=base_model:finetune:openai/gpt-oss-120b), despite its size!
729
 
730
 
731
  <Note variant="info">