Merge branch 'main' of https://huggingface.co/spaces/Molbap/sneaky
Browse files- content/article.md +79 -22
- dist/fragments/memory-profiler.html +5 -50
- dist/hf-logo.svg +1 -0
- dist/index.html +91 -116
- dist/main.bundle.js +133 -2
- dist/main.bundle.js.map +0 -0
- dist/static/d3_dependency_graph.html +14 -7
- src/fragments/memory-profiler.html +10 -5
- src/transformers-custom.css +124 -0
- webpack.config.js +6 -6
content/article.md
CHANGED
|
@@ -1,13 +1,14 @@
|
|
| 1 |
-
# Digging through tenets and time
|
| 2 |
-
|
| 3 |
|
| 4 |
## Introduction
|
| 5 |
|
| 6 |
-
The `transformers` library, built with `PyTorch`, supports all state-of-the-art LLMs, many VLMs, task-specific vision language models, video models, audio models, table models, classical encoders, to a global count of almost 400 models.
|
| 7 |
-
|
| 8 |
-
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
-
Here we will dissect what is the design philosophy of transformers, as a continuation from the existing older [philosophy](https://huggingface.co/docs/transformers/en/philosophy) page, and an accompanying [blog post from 2022](https://huggingface.co/blog/transformers-design-philosophy) . Some time ago I dare not say how long, we discussed with transformers maintainers about the state of things. A lot of recent developments were satisfactory, but if we were only talking about these, self-congratulation would be the only goalpost. Reflecting on this philosophy now, as models pile up, is essential and will drive new developments.
|
| 11 |
|
| 12 |
### What you will learn
|
| 13 |
|
|
@@ -16,19 +17,67 @@ This will also showcase new features you might have missed so you'll be up-to-da
|
|
| 16 |
|
| 17 |
So, what are the principles of `transformers`? We will try to summarize the foundations on which we've built everything, and write the "tenets" of the library. They behave like _software interfaces_, hence it is crucial that they are explicitly written down. However opinionated they are, they have evolved over time.
|
| 18 |
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
|
| 34 |
When a PR is merged, it is because the contribution is worthwhile, and that the `transformers` team finds the design of the contribution to be aligned with what is above.
|
|
@@ -59,7 +108,8 @@ As I was looking for things to improve and make better, it's one of the iteratio
|
|
| 59 |
|
| 60 |
However, both of these works were already pointing at some drawbacks, which have been iteratively addressed. [Transformers has gone modular](https://huggingface.co/docs/transformers/en/modular_transformers) , allowing a form of inheritance without breaking [One model, One file](#one-model-one-file). If you're familiar with this, you can [skip this section](#^attention-classes) and go to the next one.
|
| 61 |
|
| 62 |
-
We amended the principle of [DRY*](#do-repeat-yourself) by removing progressively
|
|
|
|
| 63 |
It is explained in details in the documentation above, but overall it works like this, you define a `modular_` file that can inherit from _any function across all other modeling, configuration and processor files_:
|
| 64 |
|
| 65 |
<summary>Auto-generated modeling code</summary>
|
|
@@ -84,7 +134,14 @@ It is a strength of the new attention interface, where it can be plugged in vari
|
|
| 84 |
|
| 85 |
For better _information_, we plan to use `python` features such as `Annotated` for example, to inform users of what we expect typically in an argument. That way, higher-level information could be included directly in the type annotations.
|
| 86 |
|
| 87 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 88 |
|
| 89 |
The same principle extends to normalization, activation, and other hot paths. The model defines **semantics**; a kernel defines **how** to execute them faster. We annotate the module to borrow a community‑provided forward, keeping a [consistent public surface](#consistent-public-surface)
|
| 90 |
|
|
@@ -201,7 +258,7 @@ I took again a similarity measure and looked at the existing graphs. The tool is
|
|
| 201 |
|
| 202 |

|
| 203 |
|
| 204 |
-
## <a id="encoders-ftw"></a>
|
| 205 |
|
| 206 |
Models popularity speaks for itself! This is because the usage of encoders lies in embeddings obviously. So we have to keep the encoders part viable, usable, fine-tune-able.
|
| 207 |
|
|
|
|
|
|
|
|
|
|
| 1 |
|
| 2 |
## Introduction
|
| 3 |
|
| 4 |
+
The `transformers` library, built with `PyTorch`, supports all state-of-the-art LLMs, many VLMs, task-specific vision language models, video models, audio models, table models, classical encoders, to a global count of almost 400 models.
|
| 5 |
+
The name of the library itself is mostly majority driven as many models are not even transformers architectures, like Mamba, Zamba, RWKV, and convolution-based models.
|
| 6 |
+
Regardless, each of these is wrought by the research and engineering team that created them, then harmonized into a now famous interface, and callable with a simple `.from_pretrained` command.
|
| 7 |
+
Inference works for all models, training is functional for most. The library is a foundation for many machine learning courses, cookbooks, and overall, several thousands other open-source libraries depend on it. All models are tested as part of a daily CI ensuring their preservation and reproducibility. Most importantly, it is _open-source_ and has been written by the community for a large part.
|
| 8 |
+
This isn't really to brag but to set the stakes: what does it take to keep such a ship afloat, made of so many moving, unrelated parts?
|
| 9 |
+
The ML wave has not stopped, there's more and more models being added, at a steadily growing rate. `Transformers` is widely used, and we read the feedback that users post online. Whether it's about a function that had 300+ keyword arguments, duplicated code and helpers, and mentions of `Copied from ... ` everywhere, along with optimisation concerns. Text-only models are relatively tamed, but multimodal models remain to be harmonized.
|
| 10 |
|
| 11 |
+
Here we will dissect what is the new design philosophy of transformers, as a continuation from the existing older [philosophy](https://huggingface.co/docs/transformers/en/philosophy) page, and an accompanying [blog post from 2022](https://huggingface.co/blog/transformers-design-philosophy) . Some time ago I dare not say how long, we discussed with transformers maintainers about the state of things. A lot of recent developments were satisfactory, but if we were only talking about these, self-congratulation would be the only goalpost. Reflecting on this philosophy now, as models pile up, is essential and will drive new developments.
|
| 12 |
|
| 13 |
### What you will learn
|
| 14 |
|
|
|
|
| 17 |
|
| 18 |
So, what are the principles of `transformers`? We will try to summarize the foundations on which we've built everything, and write the "tenets" of the library. They behave like _software interfaces_, hence it is crucial that they are explicitly written down. However opinionated they are, they have evolved over time.
|
| 19 |
|
| 20 |
+
<div class="tenet-list">
|
| 21 |
+
<ol>
|
| 22 |
+
<li class="tenet">
|
| 23 |
+
<a id="source-of-truth"></a>
|
| 24 |
+
<strong>Source of Truth</strong>
|
| 25 |
+
<p>We should be a source of truth for all model definitions. This is not a tenet, but something that still guides our decisions. Model implementations should be reliable, reproducible, and faithful to the original performances.</p>
|
| 26 |
+
<em>This overarching guideline ensures quality and reproducibility across all models in the library.</em>
|
| 27 |
+
</li>
|
| 28 |
+
|
| 29 |
+
<li class="tenet">
|
| 30 |
+
<a id="one-model-one-file"></a>
|
| 31 |
+
<strong>One Model, One File</strong>
|
| 32 |
+
<p>All inference (and most of training, loss is separate, not a part of model) logic visible, top‑to‑bottom.</p>
|
| 33 |
+
<em>Every model should be completely understandable by reading a single file from top to bottom.</em>
|
| 34 |
+
</li>
|
| 35 |
+
<li class="tenet">
|
| 36 |
+
<a id="code-is-product"></a>
|
| 37 |
+
<strong>Code is Product</strong>
|
| 38 |
+
<p>Optimize for reading, diffing, and tweaking, our users are power users. Variables can be explicit, full words, even several words, readability is primordial.</p>
|
| 39 |
+
<em>Code quality matters as much as functionality - optimize for human readers, not just computers.</em>
|
| 40 |
+
</li>
|
| 41 |
+
<li class="tenet">
|
| 42 |
+
<a id="standardize-dont-abstract"></a>
|
| 43 |
+
<strong>Standardize, Don't Abstract</strong>
|
| 44 |
+
<p>If it's model behavior, keep it in the file; abstractions only for generic infra.</p>
|
| 45 |
+
<em>Model-specific logic belongs in the model file, not hidden behind abstractions.</em>
|
| 46 |
+
</li>
|
| 47 |
+
<li class="tenet">
|
| 48 |
+
<a id="do-repeat-yourself"></a>
|
| 49 |
+
<strong>DRY* (DO Repeat Yourself)</strong>
|
| 50 |
+
<p>Copy when it helps users; keep successors in sync without centralizing behavior.</p>
|
| 51 |
+
<p><strong>Amendment:</strong> With the introduction and global adoption of <a href="#modular">modular</a> transformers, we do not repeat any logic in the modular files, but end user files remain faithful to the original tenet.</p>
|
| 52 |
+
<em>Strategic duplication can improve readability and maintainability when done thoughtfully.</em>
|
| 53 |
+
</li>
|
| 54 |
+
<li class="tenet">
|
| 55 |
+
<a id="minimal-user-api"></a>
|
| 56 |
+
<strong>Minimal User API</strong>
|
| 57 |
+
<p>Config, model, preprocessing; from_pretrained, save_pretrained, push_to_hub. We want the least amount of codepaths. Reading should be obvious, configurations should be obvious.</p>
|
| 58 |
+
<em>Keep the public interface simple and predictable - users should know what to expect.</em>
|
| 59 |
+
</li>
|
| 60 |
+
<li class="tenet">
|
| 61 |
+
<a id="backwards-compatibility"></a>
|
| 62 |
+
<strong>Backwards Compatibility</strong>
|
| 63 |
+
<p>Evolve by additive standardization, <strong>never</strong> break public APIs.</p>
|
| 64 |
+
<p><strong>Note:</strong> Some models are showing almost no use, we also stopped adding new features for non-torch frameworks. Still, we adapt to models existing on the hub.</p>
|
| 65 |
+
<em>Once something is public, it stays public - evolution through addition, not breaking changes.</em>
|
| 66 |
+
</li>
|
| 67 |
+
<li class="tenet">
|
| 68 |
+
<a id="consistent-public-surface"></a>
|
| 69 |
+
<strong>Consistent Public Surface</strong>
|
| 70 |
+
<p>Same argument names, same outputs, hidden states and attentions exposed, enforced by tests.</p>
|
| 71 |
+
<em>All models should feel familiar - consistent interfaces reduce cognitive load.</em>
|
| 72 |
+
</li>
|
| 73 |
+
<li class="tenet">
|
| 74 |
+
<a id="modular-toolbox"></a>
|
| 75 |
+
<strong>Modular Toolbox (Not Framework)</strong>
|
| 76 |
+
<p>We ARE a toolbox. What we are not is a framework: you should not be FORCED to rewrite every modeling, but it is <em>better</em> for your model to be able to inherit from PreTrainedModel and have enabled TensorParallel, from_pretrained, sharding, push_to_hub, loss, as well as PEFT/TRL/SGLang/vLLM.</p>
|
| 77 |
+
<em>This is the largest change. Provide tools and utilities, but don't force users into a rigid framework.</em>
|
| 78 |
+
</li>
|
| 79 |
+
</ol>
|
| 80 |
+
</div>
|
| 81 |
|
| 82 |
|
| 83 |
When a PR is merged, it is because the contribution is worthwhile, and that the `transformers` team finds the design of the contribution to be aligned with what is above.
|
|
|
|
| 108 |
|
| 109 |
However, both of these works were already pointing at some drawbacks, which have been iteratively addressed. [Transformers has gone modular](https://huggingface.co/docs/transformers/en/modular_transformers) , allowing a form of inheritance without breaking [One model, One file](#one-model-one-file). If you're familiar with this, you can [skip this section](#^attention-classes) and go to the next one.
|
| 110 |
|
| 111 |
+
We amended the principle of [DRY*](#do-repeat-yourself) by removing progressively all pieces of code that were "copied from" another file.
|
| 112 |
+
|
| 113 |
It is explained in details in the documentation above, but overall it works like this, you define a `modular_` file that can inherit from _any function across all other modeling, configuration and processor files_:
|
| 114 |
|
| 115 |
<summary>Auto-generated modeling code</summary>
|
|
|
|
| 134 |
|
| 135 |
For better _information_, we plan to use `python` features such as `Annotated` for example, to inform users of what we expect typically in an argument. That way, higher-level information could be included directly in the type annotations.
|
| 136 |
|
| 137 |
+
## <a id="simpler-tensor-parallelism"></a> Simpler Tensor Parallelism
|
| 138 |
+
|
| 139 |
+
We want to touch minimally to the modeling code, and only modify it when _architectural changes_ are involved. For instance, for tensor parallelism, we instead now specify a simple `tp_plan`.
|
| 140 |
+
|
| 141 |
+
## <a id="layers-attentions-caches"></a> Layers, attentions and caches
|
| 142 |
+
With th
|
| 143 |
+
|
| 144 |
+
## <a id="community-kernels"></a>Community Kernels
|
| 145 |
|
| 146 |
The same principle extends to normalization, activation, and other hot paths. The model defines **semantics**; a kernel defines **how** to execute them faster. We annotate the module to borrow a community‑provided forward, keeping a [consistent public surface](#consistent-public-surface)
|
| 147 |
|
|
|
|
| 258 |
|
| 259 |

|
| 260 |
|
| 261 |
+
## <a id="encoders-ftw"></a> The neverending stories of encoder models.
|
| 262 |
|
| 263 |
Models popularity speaks for itself! This is because the usage of encoders lies in embeddings obviously. So we have to keep the encoders part viable, usable, fine-tune-able.
|
| 264 |
|
dist/fragments/memory-profiler.html
CHANGED
|
@@ -1,61 +1,16 @@
|
|
| 1 |
<div style="border: 1px solid #e2e8f0; border-radius: 8px; background: white; margin: 1.5rem 0;">
|
| 2 |
<div style="padding: 1rem; border-bottom: 1px solid #e2e8f0; background: #f8f9fa;">
|
| 3 |
-
<h4 style="margin: 0 0 0.5rem 0; color: #495057;">🚀
|
| 4 |
<p style="margin: 0; font-size: 0.9em; color: #6c757d;">
|
| 5 |
-
|
| 6 |
</p>
|
| 7 |
</div>
|
| 8 |
|
| 9 |
<div style="padding: 1rem;">
|
| 10 |
-
<
|
| 11 |
-
<div>
|
| 12 |
-
<label style="display: block; font-weight: 600; margin-bottom: 0.5rem; color: #374151;">Model to Profile:</label>
|
| 13 |
-
<select id=memory-model-select style="width: 100%; padding: 0.5rem; border: 1px solid #d1d5db; border-radius: 6px; background: white;">
|
| 14 |
-
<option value=openai-community/gpt2>openai-community/gpt2</option>
|
| 15 |
-
<option value=google/gemma-2-2b>google/gemma-2-2b</option>
|
| 16 |
-
<option value=microsoft/DialoGPT-small>microsoft/DialoGPT-small</option>
|
| 17 |
-
<option value=facebook/opt-125m>facebook/opt-125m</option>
|
| 18 |
-
</select>
|
| 19 |
-
<div style="font-size: 0.8em; color: #6c757d; margin-top: 0.25rem;">
|
| 20 |
-
Select a model or enter a custom HuggingFace model ID
|
| 21 |
-
</div>
|
| 22 |
-
</div>
|
| 23 |
-
|
| 24 |
-
<div>
|
| 25 |
-
<button id=memory-profile-btn style="padding: 0.75rem 1.5rem; background: #dc2626; color: white; border: none; border-radius: 6px; cursor: pointer; font-weight: 500;">
|
| 26 |
-
🔥 Profile Memory
|
| 27 |
-
</button>
|
| 28 |
-
</div>
|
| 29 |
-
</div>
|
| 30 |
-
|
| 31 |
-
<div id=memory-chart-container style="width: 100%; height: 400px; border: 1px solid #e2e8f0; border-radius: 6px; background: #f8f9fa; position: relative;">
|
| 32 |
-
<div id=memory-placeholder style="position: absolute; top: 50%; left: 50%; transform: translate(-50%, -50%); text-align: center; color: #6c757d; font-style: italic;">
|
| 33 |
-
Click "Profile Memory" to generate memory allocation timeline
|
| 34 |
-
</div>
|
| 35 |
-
<canvas id=memory-chart width=100% height=400 style="display: none;"></canvas>
|
| 36 |
-
</div>
|
| 37 |
-
|
| 38 |
-
<div id=memory-stats style="margin-top: 1rem; padding: 1rem; background: #f1f5f9; border-radius: 6px; display: none;">
|
| 39 |
-
<h5 style="margin: 0 0 0.5rem 0; color: #374151;">Memory Statistics</h5>
|
| 40 |
-
<div id=memory-results></div>
|
| 41 |
-
</div>
|
| 42 |
</div>
|
| 43 |
|
| 44 |
<div style="padding: 1rem; border-top: 1px solid #e2e8f0; background: #f8f9fa; font-size: 0.9em; color: #6c757d;">
|
| 45 |
-
<
|
| 46 |
-
In the original app, this uses ZeroGPU to measure actual memory allocation timelines.
|
| 47 |
</div>
|
| 48 |
-
</div>
|
| 49 |
-
|
| 50 |
-
<script>document.addEventListener("DOMContentLoaded",function(){let e=document.getElementById("memory-model-select"),t=document.getElementById("memory-profile-btn"),l=document.getElementById("memory-chart-container"),o=document.getElementById("memory-placeholder"),n=document.getElementById("memory-chart"),i=document.getElementById("memory-stats"),m=document.getElementById("memory-results");t.addEventListener("click",function(){let d=e.value;t.disabled=!0,t.textContent="Profiling...",o.innerHTML='<div style="color: #6c757d;"><em>Loading model and measuring memory usage...</em><br><div style="margin-top: 0.5rem;">This may take a few moments</div></div>',i.style.display="none",setTimeout(()=>{let e=[],r=[],a=[];for(let t=0;t<=50;t++){let l=.1*t;e.push(l);let o=Math.max(0,500+15*Math.pow(t,1.5)+50*Math.random());r.push(o);let n=Math.max(0,600+18*Math.pow(t,1.8)+80*Math.random());a.push(n)}o.style.display="none",n.style.display="block";let s=n.getContext("2d"),y=n.width=l.offsetWidth-2,f=n.height=400;s.clearRect(0,0,y,f),s.strokeStyle="#d1d5db",s.beginPath(),s.moveTo(50,20),s.lineTo(50,f-50),s.lineTo(y-20,f-50),s.stroke(),s.strokeStyle="#f3f4f6";for(let e=1;e<10;e++){let t=20+(f-70)*e/10;s.beginPath(),s.moveTo(50,t),s.lineTo(y-20,t),s.stroke()}let g=Math.max(...a),c=(e,t)=>{s.strokeStyle=t,s.lineWidth=3,s.beginPath();for(let t=0;t<e.length;t++){let l=50+(y-70)*t/(e.length-1),o=f-50-(f-70)*e[t]/g;0===t?s.moveTo(l,o):s.lineTo(l,o)}s.stroke()};c(a,"#ef4444"),c(r,"#22c55e"),s.fillStyle="#374151",s.font="14px sans-serif",s.fillText("Memory (MiB)",10,f/2),s.fillText("Time (seconds)",y/2-50,f-10),s.fillStyle="#ef4444",s.fillRect(y-200,30,15,15),s.fillStyle="#374151",s.fillText("\uD83D\uDCC8 Warmup OFF (Standard)",y-180,42),s.fillStyle="#22c55e",s.fillRect(y-200,50,15,15),s.fillStyle="#374151",s.fillText("\uD83D\uDE80 Warmup ON (Optimized)",y-180,62);let h=Math.max(...r),u=Math.max(...a),p=(u-h)/u*100;m.innerHTML=`
|
| 51 |
-
<div style="display: grid; grid-template-columns: 1fr 1fr; gap: 1rem;">
|
| 52 |
-
<div>
|
| 53 |
-
<strong>Peak Memory (Warmup OFF):</strong> ${u.toFixed(0)} MiB<br>
|
| 54 |
-
<strong>Peak Memory (Warmup ON):</strong> ${h.toFixed(0)} MiB
|
| 55 |
-
</div>
|
| 56 |
-
<div>
|
| 57 |
-
<strong>Memory Savings:</strong> ${p.toFixed(1)}%<br>
|
| 58 |
-
<strong>Model:</strong> ${d}
|
| 59 |
-
</div>
|
| 60 |
-
</div>
|
| 61 |
-
`,i.style.display="block",t.disabled=!1,t.textContent="\uD83D\uDD25 Profile Memory"},3e3)})})</script>
|
|
|
|
| 1 |
<div style="border: 1px solid #e2e8f0; border-radius: 8px; background: white; margin: 1.5rem 0;">
|
| 2 |
<div style="padding: 1rem; border-bottom: 1px solid #e2e8f0; background: #f8f9fa;">
|
| 3 |
+
<h4 style="margin: 0 0 0.5rem 0; color: #495057;">🚀 CUDA Warmup Efficiency Benchmark</h4>
|
| 4 |
<p style="margin: 0; font-size: 0.9em; color: #6c757d;">
|
| 5 |
+
Real CUDA warmup benchmarking with actual Transformers models. Measure the performance impact of the caching_allocator_warmup function.
|
| 6 |
</p>
|
| 7 |
</div>
|
| 8 |
|
| 9 |
<div style="padding: 1rem;">
|
| 10 |
+
<iframe src=https://molbap-cuda-warmup-transformers.hf.space width=100% height=800px frameborder=0 style="border-radius: 8px; background: white;"></iframe>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
</div>
|
| 12 |
|
| 13 |
<div style="padding: 1rem; border-top: 1px solid #e2e8f0; background: #f8f9fa; font-size: 0.9em; color: #6c757d;">
|
| 14 |
+
Real CUDA warmup benchmarking with actual Transformers models. Measure the performance impact of the <code>caching_allocator_warmup</code> function at <code>transformers/src/transformers/modeling_utils.py:6186</code>. This interactive tool loads models twice - once with warmup disabled and once with warmup enabled - to demonstrate the significant loading time improvements.
|
|
|
|
| 15 |
</div>
|
| 16 |
+
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
dist/hf-logo.svg
ADDED
|
|
dist/index.html
CHANGED
|
@@ -8,22 +8,22 @@
|
|
| 8 |
<script src="https://d3js.org/d3.v7.min.js"></script>
|
| 9 |
<meta name="viewport" content="width=device-width, initial-scale=1">
|
| 10 |
<meta charset="utf8">
|
| 11 |
-
<title>
|
| 12 |
<link rel="stylesheet" href="style.css">
|
| 13 |
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css">
|
| 14 |
</head>
|
| 15 |
<body>
|
| 16 |
<d-front-matter>
|
| 17 |
<script id='distill-front-matter' type="text/json">{
|
| 18 |
-
"title": "
|
| 19 |
-
"description": "
|
| 20 |
"published": "Aug 21, 2025",
|
| 21 |
"authors": [{"author": "Pablo Montalvo", "authorURL": "https://huggingface.co/Molbap"}]
|
| 22 |
}</script>
|
| 23 |
</d-front-matter>
|
| 24 |
<d-title>
|
| 25 |
-
<h1>
|
| 26 |
-
<p>
|
| 27 |
</d-title>
|
| 28 |
<d-byline></d-byline>
|
| 29 |
<d-article>
|
|
@@ -47,51 +47,78 @@
|
|
| 47 |
</div>
|
| 48 |
</nav>
|
| 49 |
</d-contents>
|
| 50 |
-
<
|
| 51 |
-
<
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
|
|
|
|
|
|
|
|
|
| 55 |
<h3>What you will learn</h3>
|
| 56 |
<p>Every reader, whether an OSS maintainer, power user, or casual fine-tuner, will walk away knowing how to reason about the <code>transformers</code> code base, how to use it better, how to meaningfully contribute to it.
|
| 57 |
This will also showcase new features you might have missed so you’ll be up-to-date.</p>
|
| 58 |
<p>So, what are the principles of <code>transformers</code>? We will try to summarize the foundations on which we’ve built everything, and write the “tenets” of the library. They behave like <em>software interfaces</em>, hence it is crucial that they are explicitly written down. However opinionated they are, they have evolved over time.</p>
|
| 59 |
-
<
|
| 60 |
-
<
|
| 61 |
-
<
|
| 62 |
-
|
| 63 |
-
<
|
| 64 |
-
<p
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
<
|
| 68 |
-
|
| 69 |
-
<
|
| 70 |
-
<p
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 71 |
</li>
|
| 72 |
-
<li>
|
| 73 |
-
<
|
| 74 |
-
|
|
|
|
|
|
|
|
|
|
| 75 |
</li>
|
| 76 |
-
<li>
|
| 77 |
-
<
|
|
|
|
|
|
|
|
|
|
| 78 |
</li>
|
| 79 |
-
<li>
|
| 80 |
-
<
|
| 81 |
-
<
|
| 82 |
-
<
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
<
|
| 87 |
-
|
| 88 |
-
<
|
| 89 |
-
<p
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
</li>
|
| 91 |
</ol>
|
| 92 |
-
|
| 93 |
-
<li>This is the largest change. We ARE a toolbox. What we are not is a framework: you should not be FORCED to rewrite every modeling, but it is <em>better</em> for your model to be able to inherit from PreTrainedModel and have enabled TensorParallel, from_pretrained, sharding, push_to_hub, loss, as well as PEFT/TRL/SGLang/vLLM.</li>
|
| 94 |
-
</ul>
|
| 95 |
<p>When a PR is merged, it is because the contribution is worthwhile, and that the <code>transformers</code> team finds the design of the contribution to be aligned with what is above.</p>
|
| 96 |
<p>Does all the code in the library follow strictly these tenets? No. The library is a gigantic house with connected nooks, corridors, crannies everywhere built by thousands of different workers. We <em>try</em> to make it so all the code added is inline, lest we break <a href="#backwards-compatibility">backwards compatibility</a>.</p>
|
| 97 |
<p>For instance, one function essential to the implementation of <a href="https://huggingface.co/papers/2104.09864">Rotary Positional Embeddings</a> is identical in 70 <code>modeling_<file>.py</code> across <code>src/transformers/models/.</code> Why keep it? Because removing it would make those files unloadable checkpoints rather than self-contained blueprints. We <a href="#do-repeat-yourself">do repeat ourselves</a>.</p>
|
|
@@ -106,8 +133,8 @@ This will also showcase new features you might have missed so you’ll be up-to-
|
|
| 106 |
<p>As I was looking for things to improve and make better, it’s one of the iterations I attempted: a function is almost everywhere the same, let’s import it from some common file? But no! Goes against</p>
|
| 107 |
<h2><a id="modular"></a> Going modular</h2>
|
| 108 |
<p>However, both of these works were already pointing at some drawbacks, which have been iteratively addressed. <a href="https://huggingface.co/docs/transformers/en/modular_transformers">Transformers has gone modular</a> , allowing a form of inheritance without breaking <a href="#one-model-one-file">One model, One file</a>. If you’re familiar with this, you can <a href="#%5Eattention-classes">skip this section</a> and go to the next one.</p>
|
| 109 |
-
<p>We amended the principle of <a href="#do-repeat-yourself">DRY*</a> by removing progressively
|
| 110 |
-
It is explained in details in the documentation above, but overall it works like this, you define a <code>modular_</code> file that can inherit from <em>any function across all other modeling, configuration and processor files</em>:</p>
|
| 111 |
<summary>Auto-generated modeling code</summary>
|
| 112 |
<p><div class=code-compare style="display: grid; grid-template-columns: 1fr 1fr; gap: 1rem; margin: 1.5rem 0;">
|
| 113 |
<div class=code-column style="border: 1px solid #e2e8f0; border-radius: 8px; overflow: hidden;">
|
|
@@ -267,14 +294,18 @@ if self.config._attn_implementation != "eager":
|
|
| 267 |
</code></pre>
|
| 268 |
<p>We often read and understand that <code>kwargs</code> are criticized, and we are typing them however we can, but we cannot enforce them all the time because other libraries such as vLLM don’'t use the same kwargs.</p>
|
| 269 |
<p>It is a strength of the new attention interface, where it can be plugged in various backends, because most of the signature is not enforced. We INFORM but do not ENFORCE. That way, the current system is a <a href="#minimal-user-api">minimal user api</a>.</p>
|
| 270 |
-
<p>For a better <em>information</em>, we plan to use <code>python</code>features such as <code>
|
| 271 |
-
<h2>
|
|
|
|
|
|
|
|
|
|
|
|
|
| 272 |
<p>The same principle extends to normalization, activation, and other hot paths. The model defines <strong>semantics</strong>; a kernel defines <strong>how</strong> to execute them faster. We annotate the module to borrow a community‑provided forward, keeping a <a href="#consistent-public-surface">consistent public surface</a></p>
|
| 273 |
<pre><code class="language-python">@use_kernel_forward_from_hub("RMSNorm")
|
| 274 |
class GlmRMSNorm(nn.Module):
|
| 275 |
...
|
| 276 |
</code></pre>
|
| 277 |
-
<p>Plus, this opened another angle of contribution for the community. People who are GPU
|
| 278 |
<h2>The good modularity</h2>
|
| 279 |
<p>Now, we have a form of inheritance in our codebase. Some models become standards, and model contributors are given the opportunity to <em>define standards</em>. Pushing the boundaries of scientific knowledge can translate into the boundaries of engineering if this effort is made, and we’re striving for it.</p>
|
| 280 |
<p>My capacity for abstraction is not that great, compared to other computer scientists and engineers: I need to look at little doodles and drawings, especially when components pile up.</p>
|
|
@@ -385,7 +416,7 @@ Example outputs:
|
|
| 385 |
<p>So the question abounds naturally: How can we modularize more?
|
| 386 |
I took again a similarity measure and looked at the existing graphs. The tool is available on this <a href="https://huggingface.co/spaces/Molbap/transformers-modular-refactor">ZeroGPU-enabled Space</a>. It scans the whole transformers repository, and outputs a graph of candidates across models, using either a Jaccard similarity index (simple) or a SentenceTransformers embedding model. It is understandable that <a href="#encoders-ftw">encoder models still have a lion’s share of the game.</a> See also <a href="https://huggingface.co/blog/train-sparse-encoder">Tom Aarsen and Arhur Bresnu’s great blog post on the topic of sparse embeddings.</a>.</p>
|
| 387 |
<p><img src="static/modular_candidates.png" alt="Modular candidates analysis"></p>
|
| 388 |
-
<h2><a id="encoders-ftw"></a>
|
| 389 |
<p>Models popularity speaks for itself! This is because the usage of encoders lies in embeddings obviously. So we have to keep the encoders part viable, usable, fine-tune-able.</p>
|
| 390 |
<p><img src="static/popular_models_barplot.png" alt="Popular models bar plot"></p>
|
| 391 |
<h2>On image processing and processors</h2>
|
|
@@ -458,79 +489,23 @@ machinery is the <code>attention mask</code>, cause of confusion. Thankfully, we
|
|
| 458 |
<li>usable in vLLM, SGLang, and so on without additional code.</li>
|
| 459 |
</ul>
|
| 460 |
<p>## Inner cooking: CUDA Warmup</p>
|
| 461 |
-
<p>Having a clean <em>external</em> API allows us to work on the true inner workings of transformers. One of the few recent additions was the <em>CUDA warmup</em> via <code>caching_allocator_warmup</code> which improved massively the loading
|
| 462 |
-
<div
|
| 463 |
-
<div
|
| 464 |
-
<
|
|
|
|
|
|
|
|
|
|
| 465 |
</div>
|
| 466 |
-
|
|
|
|
| 467 |
<iframe src=https://molbap-cuda-warmup-transformers.hf.space width=100% height=800px frameborder=0 style="border-radius: 8px; background: white;"></iframe>
|
| 468 |
</div>
|
| 469 |
-
|
|
|
|
| 470 |
Real CUDA warmup benchmarking with actual Transformers models. Measure the performance impact of the <code>caching_allocator_warmup</code> function at <code>transformers/src/transformers/modeling_utils.py:6186</code>. This interactive tool loads models twice - once with warmup disabled and once with warmup enabled - to demonstrate the significant loading time improvements.
|
| 471 |
</div>
|
| 472 |
-
</div>
|
| 473 |
-
|
| 474 |
-
|
| 475 |
-
|
| 476 |
-
|
| 477 |
-
|
| 478 |
-
|
| 479 |
-
|
| 480 |
-
|
| 481 |
-
|
| 482 |
-
|
| 483 |
-
|
| 484 |
-
|
| 485 |
-
|
| 486 |
-
|
| 487 |
-
|
| 488 |
-
|
| 489 |
-
|
| 490 |
-
|
| 491 |
-
|
| 492 |
-
|
| 493 |
-
|
| 494 |
-
|
| 495 |
-
|
| 496 |
-
|
| 497 |
-
|
| 498 |
-
|
| 499 |
-
|
| 500 |
-
|
| 501 |
-
|
| 502 |
-
|
| 503 |
-
|
| 504 |
-
|
| 505 |
-
|
| 506 |
-
|
| 507 |
-
|
| 508 |
-
|
| 509 |
-
|
| 510 |
-
|
| 511 |
-
|
| 512 |
-
|
| 513 |
-
|
| 514 |
-
|
| 515 |
-
|
| 516 |
-
|
| 517 |
-
|
| 518 |
-
|
| 519 |
-
|
| 520 |
-
|
| 521 |
-
|
| 522 |
-
|
| 523 |
-
|
| 524 |
-
|
| 525 |
-
|
| 526 |
-
|
| 527 |
-
|
| 528 |
-
|
| 529 |
-
|
| 530 |
-
|
| 531 |
-
|
| 532 |
-
|
| 533 |
-
|
| 534 |
<h3>Linkedin post (to remove)</h3>
|
| 535 |
<p>Linkedin post for videos:</p>
|
| 536 |
<p>In transformers, how do we deal with cross-model dependencies, while supporting ~400 models? Maybe you’ve seen the same 200-lines functions in too many <em>modeling_file.py</em>? Duplication isn’t inevitable.</p>
|
|
@@ -566,7 +541,7 @@ machinery is the <code>attention mask</code>, cause of confusion. Thankfully, we
|
|
| 566 |
|
| 567 |
// Extract tenet text for tooltips
|
| 568 |
const tenetTooltips = {
|
| 569 |
-
'source-of-truth': 'We
|
| 570 |
'one-model-one-file': 'All inference (and most of training, loss is separate, not a part of model) logic visible, top‑to‑bottom.',
|
| 571 |
'code-is-product': 'Optimize for reading, diffing, and tweaking, our users are power users. Variables can be explicit, full words, even several words, readability is primordial.',
|
| 572 |
'standardize-dont-abstract': 'If it\'s model behavior, keep it in the file; abstractions only for generic infra.',
|
|
|
|
| 8 |
<script src="https://d3js.org/d3.v7.min.js"></script>
|
| 9 |
<meta name="viewport" content="width=device-width, initial-scale=1">
|
| 10 |
<meta charset="utf8">
|
| 11 |
+
<title>Scaling insanity: maintaining hundreds of model definitions</title>
|
| 12 |
<link rel="stylesheet" href="style.css">
|
| 13 |
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css">
|
| 14 |
</head>
|
| 15 |
<body>
|
| 16 |
<d-front-matter>
|
| 17 |
<script id='distill-front-matter' type="text/json">{
|
| 18 |
+
"title": "Scaling insanity: maintaining hundreds of model definitions",
|
| 19 |
+
"description": "A peek into software engineering for the transformers library",
|
| 20 |
"published": "Aug 21, 2025",
|
| 21 |
"authors": [{"author": "Pablo Montalvo", "authorURL": "https://huggingface.co/Molbap"}]
|
| 22 |
}</script>
|
| 23 |
</d-front-matter>
|
| 24 |
<d-title>
|
| 25 |
+
<h1>Scaling insanity: maintaining hundreds of model definitions</h1>
|
| 26 |
+
<p>A peek into software engineering for the transformers library</p>
|
| 27 |
</d-title>
|
| 28 |
<d-byline></d-byline>
|
| 29 |
<d-article>
|
|
|
|
| 47 |
</div>
|
| 48 |
</nav>
|
| 49 |
</d-contents>
|
| 50 |
+
<h2>Introduction</h2>
|
| 51 |
+
<p>The <code>transformers</code> library, built with <code>PyTorch</code>, supports all state-of-the-art LLMs, many VLMs, task-specific vision language models, video models, audio models, table models, classical encoders, to a global count of almost 400 models.
|
| 52 |
+
The name of the library itself is mostly majority driven as many models are not even transformers architectures, like Mamba, Zamba, RWKV, and convolution-based models.
|
| 53 |
+
Regardless, each of these is wrought by the research and engineering team that created them, then harmonized into a now famous interface, and callable with a simple <code>.from_pretrained</code> command.
|
| 54 |
+
Inference works for all models, training is functional for most. The library is a foundation for many machine learning courses, cookbooks, and overall, several thousands other open-source libraries depend on it. All models are tested as part of a daily CI ensuring their preservation and reproducibility. Most importantly, it is <em>open-source</em> and has been written by the community for a large part.
|
| 55 |
+
This isn’t really to brag but to set the stakes: what does it take to keep such a ship afloat, made of so many moving, unrelated parts?
|
| 56 |
+
The ML wave has not stopped, there’s more and more models being added, at a steadily growing rate. <code>Transformers</code> is widely used, and we read the feedback that users post online. Whether it’s about a function that had 300+ keyword arguments, duplicated code and helpers, and mentions of <code>Copied from ... </code> everywhere, along with optimisation concerns. Text-only models are relatively tamed, but multimodal models remain to be harmonized.</p>
|
| 57 |
+
<p>Here we will dissect what is the new design philosophy of transformers, as a continuation from the existing older <a href="https://huggingface.co/docs/transformers/en/philosophy">philosophy</a> page, and an accompanying <a href="https://huggingface.co/blog/transformers-design-philosophy">blog post from 2022</a> . Some time ago I dare not say how long, we discussed with transformers maintainers about the state of things. A lot of recent developments were satisfactory, but if we were only talking about these, self-congratulation would be the only goalpost. Reflecting on this philosophy now, as models pile up, is essential and will drive new developments.</p>
|
| 58 |
<h3>What you will learn</h3>
|
| 59 |
<p>Every reader, whether an OSS maintainer, power user, or casual fine-tuner, will walk away knowing how to reason about the <code>transformers</code> code base, how to use it better, how to meaningfully contribute to it.
|
| 60 |
This will also showcase new features you might have missed so you’ll be up-to-date.</p>
|
| 61 |
<p>So, what are the principles of <code>transformers</code>? We will try to summarize the foundations on which we’ve built everything, and write the “tenets” of the library. They behave like <em>software interfaces</em>, hence it is crucial that they are explicitly written down. However opinionated they are, they have evolved over time.</p>
|
| 62 |
+
<div class="tenet-list">
|
| 63 |
+
<ol>
|
| 64 |
+
<li class="tenet">
|
| 65 |
+
<a id="source-of-truth"></a>
|
| 66 |
+
<strong>Source of Truth</strong>
|
| 67 |
+
<p>We should be a source of truth for all model definitions. This is not a tenet, but something that still guides our decisions. Model implementations should be reliable, reproducible, and faithful to the original performances.</p>
|
| 68 |
+
<em>This overarching guideline ensures quality and reproducibility across all models in the library.</em>
|
| 69 |
+
</li>
|
| 70 |
+
<li class="tenet">
|
| 71 |
+
<a id="one-model-one-file"></a>
|
| 72 |
+
<strong>One Model, One File</strong>
|
| 73 |
+
<p>All inference (and most of training, loss is separate, not a part of model) logic visible, top‑to‑bottom.</p>
|
| 74 |
+
<em>Every model should be completely understandable by reading a single file from top to bottom.</em>
|
| 75 |
+
</li>
|
| 76 |
+
<li class="tenet">
|
| 77 |
+
<a id="code-is-product"></a>
|
| 78 |
+
<strong>Code is Product</strong>
|
| 79 |
+
<p>Optimize for reading, diffing, and tweaking, our users are power users. Variables can be explicit, full words, even several words, readability is primordial.</p>
|
| 80 |
+
<em>Code quality matters as much as functionality - optimize for human readers, not just computers.</em>
|
| 81 |
+
</li>
|
| 82 |
+
<li class="tenet">
|
| 83 |
+
<a id="standardize-dont-abstract"></a>
|
| 84 |
+
<strong>Standardize, Don't Abstract</strong>
|
| 85 |
+
<p>If it's model behavior, keep it in the file; abstractions only for generic infra.</p>
|
| 86 |
+
<em>Model-specific logic belongs in the model file, not hidden behind abstractions.</em>
|
| 87 |
</li>
|
| 88 |
+
<li class="tenet">
|
| 89 |
+
<a id="do-repeat-yourself"></a>
|
| 90 |
+
<strong>DRY* (DO Repeat Yourself)</strong>
|
| 91 |
+
<p>Copy when it helps users; keep successors in sync without centralizing behavior.</p>
|
| 92 |
+
<p><strong>Amendment:</strong> With the introduction and global adoption of <a href="#modular">modular</a> transformers, we do not repeat any logic in the modular files, but end user files remain faithful to the original tenet.</p>
|
| 93 |
+
<em>Strategic duplication can improve readability and maintainability when done thoughtfully.</em>
|
| 94 |
</li>
|
| 95 |
+
<li class="tenet">
|
| 96 |
+
<a id="minimal-user-api"></a>
|
| 97 |
+
<strong>Minimal User API</strong>
|
| 98 |
+
<p>Config, model, preprocessing; from_pretrained, save_pretrained, push_to_hub. We want the least amount of codepaths. Reading should be obvious, configurations should be obvious.</p>
|
| 99 |
+
<em>Keep the public interface simple and predictable - users should know what to expect.</em>
|
| 100 |
</li>
|
| 101 |
+
<li class="tenet">
|
| 102 |
+
<a id="backwards-compatibility"></a>
|
| 103 |
+
<strong>Backwards Compatibility</strong>
|
| 104 |
+
<p>Evolve by additive standardization, <strong>never</strong> break public APIs.</p>
|
| 105 |
+
<p><strong>Note:</strong> Some models are showing almost no use, we also stopped adding new features for non-torch frameworks. Still, we adapt to models existing on the hub.</p>
|
| 106 |
+
<em>Once something is public, it stays public - evolution through addition, not breaking changes.</em>
|
| 107 |
+
</li>
|
| 108 |
+
<li class="tenet">
|
| 109 |
+
<a id="consistent-public-surface"></a>
|
| 110 |
+
<strong>Consistent Public Surface</strong>
|
| 111 |
+
<p>Same argument names, same outputs, hidden states and attentions exposed, enforced by tests.</p>
|
| 112 |
+
<em>All models should feel familiar - consistent interfaces reduce cognitive load.</em>
|
| 113 |
+
</li>
|
| 114 |
+
<li class="tenet">
|
| 115 |
+
<a id="modular-toolbox"></a>
|
| 116 |
+
<strong>Modular Toolbox (Not Framework)</strong>
|
| 117 |
+
<p>We ARE a toolbox. What we are not is a framework: you should not be FORCED to rewrite every modeling, but it is <em>better</em> for your model to be able to inherit from PreTrainedModel and have enabled TensorParallel, from_pretrained, sharding, push_to_hub, loss, as well as PEFT/TRL/SGLang/vLLM.</p>
|
| 118 |
+
<em>This is the largest change. Provide tools and utilities, but don't force users into a rigid framework.</em>
|
| 119 |
</li>
|
| 120 |
</ol>
|
| 121 |
+
</div>
|
|
|
|
|
|
|
| 122 |
<p>When a PR is merged, it is because the contribution is worthwhile, and that the <code>transformers</code> team finds the design of the contribution to be aligned with what is above.</p>
|
| 123 |
<p>Does all the code in the library follow strictly these tenets? No. The library is a gigantic house with connected nooks, corridors, crannies everywhere built by thousands of different workers. We <em>try</em> to make it so all the code added is inline, lest we break <a href="#backwards-compatibility">backwards compatibility</a>.</p>
|
| 124 |
<p>For instance, one function essential to the implementation of <a href="https://huggingface.co/papers/2104.09864">Rotary Positional Embeddings</a> is identical in 70 <code>modeling_<file>.py</code> across <code>src/transformers/models/.</code> Why keep it? Because removing it would make those files unloadable checkpoints rather than self-contained blueprints. We <a href="#do-repeat-yourself">do repeat ourselves</a>.</p>
|
|
|
|
| 133 |
<p>As I was looking for things to improve and make better, it’s one of the iterations I attempted: a function is almost everywhere the same, let’s import it from some common file? But no! Goes against</p>
|
| 134 |
<h2><a id="modular"></a> Going modular</h2>
|
| 135 |
<p>However, both of these works were already pointing at some drawbacks, which have been iteratively addressed. <a href="https://huggingface.co/docs/transformers/en/modular_transformers">Transformers has gone modular</a> , allowing a form of inheritance without breaking <a href="#one-model-one-file">One model, One file</a>. If you’re familiar with this, you can <a href="#%5Eattention-classes">skip this section</a> and go to the next one.</p>
|
| 136 |
+
<p>We amended the principle of <a href="#do-repeat-yourself">DRY*</a> by removing progressively all pieces of code that were “copied from” another file.</p>
|
| 137 |
+
<p>It is explained in details in the documentation above, but overall it works like this, you define a <code>modular_</code> file that can inherit from <em>any function across all other modeling, configuration and processor files</em>:</p>
|
| 138 |
<summary>Auto-generated modeling code</summary>
|
| 139 |
<p><div class=code-compare style="display: grid; grid-template-columns: 1fr 1fr; gap: 1rem; margin: 1.5rem 0;">
|
| 140 |
<div class=code-column style="border: 1px solid #e2e8f0; border-radius: 8px; overflow: hidden;">
|
|
|
|
| 294 |
</code></pre>
|
| 295 |
<p>We often read and understand that <code>kwargs</code> are criticized, and we are typing them however we can, but we cannot enforce them all the time because other libraries such as vLLM don’'t use the same kwargs.</p>
|
| 296 |
<p>It is a strength of the new attention interface, where it can be plugged in various backends, because most of the signature is not enforced. We INFORM but do not ENFORCE. That way, the current system is a <a href="#minimal-user-api">minimal user api</a>.</p>
|
| 297 |
+
<p>For a better <em>information</em>, we plan to use <code>python</code>features such as <code>Annotated</code> for example, to inform users of what we expect typically in an argument. That way, higher-level information could be included directly in the type annotations, telling for instance the expected dimensions and contents of a tensor.</p>
|
| 298 |
+
<h2><a id="simpler-tensor-parallelism"></a> Simpler Tensor Parallelism</h2>
|
| 299 |
+
<p>We want to touch minimally to the modeling code, and only modify it when <em>architectural changes</em> are involved. For instance, for tensor parallelism, we instead now specify a simple <code>tp_plan</code>.</p>
|
| 300 |
+
<h2><a id="layers-attentions-caches"></a> Layers, attentions and caches</h2>
|
| 301 |
+
<p>With th</p>
|
| 302 |
+
<h2><a id="community-kernels"></a>Community Kernels</h2>
|
| 303 |
<p>The same principle extends to normalization, activation, and other hot paths. The model defines <strong>semantics</strong>; a kernel defines <strong>how</strong> to execute them faster. We annotate the module to borrow a community‑provided forward, keeping a <a href="#consistent-public-surface">consistent public surface</a></p>
|
| 304 |
<pre><code class="language-python">@use_kernel_forward_from_hub("RMSNorm")
|
| 305 |
class GlmRMSNorm(nn.Module):
|
| 306 |
...
|
| 307 |
</code></pre>
|
| 308 |
+
<p>Plus, this opened another angle of contribution for the community. People who are GPU whisperers can check on the <a href="https://huggingface.co/blog/hello-hf-kernels">kernel community blog post</a> to learn more about it!</p>
|
| 309 |
<h2>The good modularity</h2>
|
| 310 |
<p>Now, we have a form of inheritance in our codebase. Some models become standards, and model contributors are given the opportunity to <em>define standards</em>. Pushing the boundaries of scientific knowledge can translate into the boundaries of engineering if this effort is made, and we’re striving for it.</p>
|
| 311 |
<p>My capacity for abstraction is not that great, compared to other computer scientists and engineers: I need to look at little doodles and drawings, especially when components pile up.</p>
|
|
|
|
| 416 |
<p>So the question abounds naturally: How can we modularize more?
|
| 417 |
I took again a similarity measure and looked at the existing graphs. The tool is available on this <a href="https://huggingface.co/spaces/Molbap/transformers-modular-refactor">ZeroGPU-enabled Space</a>. It scans the whole transformers repository, and outputs a graph of candidates across models, using either a Jaccard similarity index (simple) or a SentenceTransformers embedding model. It is understandable that <a href="#encoders-ftw">encoder models still have a lion’s share of the game.</a> See also <a href="https://huggingface.co/blog/train-sparse-encoder">Tom Aarsen and Arhur Bresnu’s great blog post on the topic of sparse embeddings.</a>.</p>
|
| 418 |
<p><img src="static/modular_candidates.png" alt="Modular candidates analysis"></p>
|
| 419 |
+
<h2><a id="encoders-ftw"></a> The neverending stories of encoder models.</h2>
|
| 420 |
<p>Models popularity speaks for itself! This is because the usage of encoders lies in embeddings obviously. So we have to keep the encoders part viable, usable, fine-tune-able.</p>
|
| 421 |
<p><img src="static/popular_models_barplot.png" alt="Popular models bar plot"></p>
|
| 422 |
<h2>On image processing and processors</h2>
|
|
|
|
| 489 |
<li>usable in vLLM, SGLang, and so on without additional code.</li>
|
| 490 |
</ul>
|
| 491 |
<p>## Inner cooking: CUDA Warmup</p>
|
| 492 |
+
<p>Having a clean <em>external</em> API allows us to work on the true inner workings of transformers. One of the few recent additions was the <em>CUDA warmup</em> via <code>caching_allocator_warmup</code> which improved massively the loading time by pre-allocating GPU memory to avoid malloc bottlenecks during model loading.</p>
|
| 493 |
+
<p><div style="border: 1px solid #e2e8f0; border-radius: 8px; background: white; margin: 1.5rem 0;">
|
| 494 |
+
<div style="padding: 1rem; border-bottom: 1px solid #e2e8f0; background: #f8f9fa;">
|
| 495 |
+
<h4 style="margin: 0 0 0.5rem 0; color: #495057;">🚀 CUDA Warmup Efficiency Benchmark</h4>
|
| 496 |
+
<p style="margin: 0; font-size: 0.9em; color: #6c757d;">
|
| 497 |
+
Real CUDA warmup benchmarking with actual Transformers models. Measure the performance impact of the caching_allocator_warmup function.
|
| 498 |
+
</p>
|
| 499 |
</div>
|
| 500 |
+
|
| 501 |
+
<div style="padding: 1rem;">
|
| 502 |
<iframe src=https://molbap-cuda-warmup-transformers.hf.space width=100% height=800px frameborder=0 style="border-radius: 8px; background: white;"></iframe>
|
| 503 |
</div>
|
| 504 |
+
|
| 505 |
+
<div style="padding: 1rem; border-top: 1px solid #e2e8f0; background: #f8f9fa; font-size: 0.9em; color: #6c757d;">
|
| 506 |
Real CUDA warmup benchmarking with actual Transformers models. Measure the performance impact of the <code>caching_allocator_warmup</code> function at <code>transformers/src/transformers/modeling_utils.py:6186</code>. This interactive tool loads models twice - once with warmup disabled and once with warmup enabled - to demonstrate the significant loading time improvements.
|
| 507 |
</div>
|
| 508 |
+
</div></p>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 509 |
<h3>Linkedin post (to remove)</h3>
|
| 510 |
<p>Linkedin post for videos:</p>
|
| 511 |
<p>In transformers, how do we deal with cross-model dependencies, while supporting ~400 models? Maybe you’ve seen the same 200-lines functions in too many <em>modeling_file.py</em>? Duplication isn’t inevitable.</p>
|
|
|
|
| 541 |
|
| 542 |
// Extract tenet text for tooltips
|
| 543 |
const tenetTooltips = {
|
| 544 |
+
'source-of-truth': 'We aim to be a source of truth for all model definitions. Model implementations should be reliable, reproducible, and faithful to the original performances.',
|
| 545 |
'one-model-one-file': 'All inference (and most of training, loss is separate, not a part of model) logic visible, top‑to‑bottom.',
|
| 546 |
'code-is-product': 'Optimize for reading, diffing, and tweaking, our users are power users. Variables can be explicit, full words, even several words, readability is primordial.',
|
| 547 |
'standardize-dont-abstract': 'If it\'s model behavior, keep it in the file; abstractions only for generic infra.',
|
dist/main.bundle.js
CHANGED
|
@@ -1628,11 +1628,18 @@ d-article {
|
|
| 1628 |
}
|
| 1629 |
|
| 1630 |
d-article > * {
|
| 1631 |
-
max-width: 900px
|
| 1632 |
margin-left: auto;
|
| 1633 |
margin-right: auto;
|
| 1634 |
}
|
| 1635 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1636 |
/* Improve paragraph readability */
|
| 1637 |
d-article p {
|
| 1638 |
font-size: 18px;
|
|
@@ -1682,6 +1689,130 @@ d-article ol li {
|
|
| 1682 |
margin-bottom: 0.5rem;
|
| 1683 |
}
|
| 1684 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1685 |
/* Improve blockquote styling */
|
| 1686 |
d-article blockquote {
|
| 1687 |
font-size: 19px;
|
|
@@ -1723,7 +1854,7 @@ d-article .memory-chart-container {
|
|
| 1723 |
.interactive-demo .demo-content {
|
| 1724 |
padding: 1rem;
|
| 1725 |
}
|
| 1726 |
-
}`, "",{"version":3,"sources":["webpack://./src/transformers-custom.css"],"names":[],"mappings":"AAAA,4CAA4C;;AAE5C,2BAA2B;AAC3B;IACI,aAAa;IACb,8BAA8B;IAC9B,WAAW;IACX,cAAc;IACd,kBAAkB;AACtB;;AAEA;IACI,mBAAmB;IACnB,yBAAyB;IACzB,kBAAkB;IAClB,gBAAgB;IAChB,wCAAwC;AAC5C;;AAEA;IACI,mBAAmB;IACnB,qBAAqB;IACrB,gBAAgB;IAChB,cAAc;IACd,gCAAgC;IAChC,gBAAgB;AACpB;;AAEA;IACI,SAAS;IACT,aAAa;IACb,mBAAmB;IACnB,gBAAgB;IAChB,iBAAiB;IACjB,gBAAgB;AACpB;;AAEA;IACI,cAAc;AAClB;;AAEA,8CAA8C;AAC9C;IACI;QACI,0BAA0B;QAC1B,SAAS;IACb;AACJ;;AAEA,+DAA+D;AAC/D;IACI,cAAc;AAClB;;AAEA;IACI,+BAA+B,EAAE,iBAAiB;IAClD,gBAAgB;IAChB,eAAe;IACf,aAAa;IACb,sBAAsB;IACtB,SAAS;AACb;;AAEA;IACI,gCAAgC;IAChC,6DAA6D;IAC7D,6BAA6B;IAC7B,mBAAmB;IACnB,4BAA4B;IAC5B,SAAS;IACT,kBAAkB;IAClB,0CAA0C;IAC1C,yBAAyB;IACzB,eAAe;AACnB;;AAEA;IACI,uCAAuC;IACvC,2CAA2C;IAC3C,oCAAoC;IACpC,6DAA6D;AACjE;;AAEA,8BAA8B;AAC9B,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;;AAE1G;IACI,+BAA+B;IAC/B,kBAAkB;IAClB,UAAU;IACV,WAAW;IACX,YAAY;IACZ,WAAW;IACX,YAAY;IACZ,kBAAkB;IAClB,aAAa;IACb,mBAAmB;IACnB,uBAAuB;IACvB,gBAAgB;IAChB,iBAAiB;IACjB,0CAA0C;IAC1C,uBAAuB;AAC3B;;AAEA;IACI,cAAc;IACd,gBAAgB;IAChB,cAAc;IACd,qBAAqB;AACzB;;AAEA;IACI,cAAc;IACd,iBAAiB;IACjB,kBAAkB;IAClB,cAAc;IACd,mBAAmB;IACnB,aAAa;IACb,+BAA+B;IAC/B,kBAAkB;IAClB,8BAA8B;AAClC;;AAEA;IACI,cAAc;IACd,gBAAgB;IAChB,gBAAgB;AACpB;;AAEA,iDAAiD;AACjD;IACI,KAAK,0CAA0C,EAAE;IACjD,MAAM,0CAA0C,EAAE;IAClD,OAAO,0CAA0C,EAAE;AACvD;;AAEA;IACI,6CAA6C;AACjD;;AAEA,kCAAkC;AAClC;IACI,yBAAyB;IACzB,mBAAmB;IACnB,mBAAmB;IACnB,cAAc;IACd,gBAAgB;IAChB,yCAAyC;AAC7C;;AAEA;IACI,6DAA6D;IAC7D,YAAY;IACZ,oBAAoB;IACpB,gBAAgB;AACpB;;AAEA;IACI,eAAe;AACnB;;AAEA;IACI,mBAAmB;IACnB,oBAAoB;IACpB,6BAA6B;IAC7B,cAAc;IACd,gBAAgB;AACpB;;AAEA,4CAA4C;AAC5C;IACI,6DAA6D;IAC7D,YAAY;IACZ,YAAY;IACZ,uBAAuB;IACvB,kBAAkB;IAClB,gBAAgB;IAChB,eAAe;IACf,2CAA2C;AAC/C;;AAEA;IACI,2BAA2B;IAC3B,+CAA+C;AACnD;;AAEA;IACI,YAAY;IACZ,mBAAmB;IACnB,eAAe;IACf,gBAAgB;AACpB;;AAEA,qBAAqB;AACrB;IACI,mBAAmB;IACnB,kBAAkB;IAClB,aAAa;IACb,cAAc;IACd,wDAAwD;IACxD,gBAAgB;AACpB;;AAEA;IACI,mBAAmB;IACnB,yBAAyB;IACzB,cAAc;IACd,eAAe;IACf,kBAAkB;IAClB,WAAW;IACX,oBAAoB;AACxB;;AAEA;IACI,mBAAmB;IACnB,aAAa;IACb,kBAAkB;IAClB,qBAAqB;IACrB,qBAAqB;IACrB,iBAAiB;IACjB,iBAAiB;IACjB,gBAAgB;AACpB;;AAEA,oCAAoC;AACpC;IACI,sBAAsB;IACtB,gBAAgB;IAChB,yBAAyB;IACzB,cAAc;AAClB;;AAEA;IACI,sBAAsB;IACtB,gBAAgB;IAChB,kBAAkB;IAClB,eAAe;AACnB;;AAEA,yBAAyB;AACzB;IACI,mBAAmB;IACnB,yBAAyB;IACzB,kBAAkB;IAClB,aAAa;IACb,cAAc;AAClB;;AAEA,+BAA+B;AAC/B;IACI,eAAe;IACf,YAAY;IACZ,kBAAkB;IAClB,yCAAyC;IACzC,gBAAgB;AACpB;;AAEA,kEAAkE;AAClE;IACI;QACI,4BAA4B;IAChC;;IAEA;QACI,4BAA4B;QAC5B,4BAA4B;QAC5B,+BAA+B;QAC/B,6BAA6B;QAC7B,kCAAkC;QAClC,4BAA4B;QAC5B,0BAA0B;QAC1B,6BAA6B;QAC7B,4BAA4B;QAC5B,mCAAmC,EAAE,eAAe;QACpD,2BAA2B;QAC3B,oBAAoB;QACpB,2BAA2B;QAC3B,qCAAqC;QACrC,gCAAgC;QAChC,+CAA+C;QAC/C,wBAAwB;QACxB,yBAAyB;QACzB,8BAA8B;IAClC;AACJ;;AAEA;IACI;QACI,wBAAwB;QACxB,4BAA4B;QAC5B,8BAA8B;QAC9B,4BAA4B;QAC5B,gCAAgC;QAChC,6BAA6B;QAC7B,+BAA+B;QAC/B,sDAAsD;QACtD,6BAA6B;QAC7B,qCAAqC;QACrC,gCAAgC;QAChC,wBAAwB;IAC5B;AACJ;;AAEA,0DAA0D;AAC1D;IACI,yBAAyB;IACzB,8BAA8B;IAC9B,qBAAqB;AACzB;;AAEA,2BAA2B;AAC3B;IACI,qBAAqB;IACrB,gCAAgC;IAChC,sBAAsB;AAC1B;;AAEA;IACI,iBAAiB;IACjB,gBAAgB;IAChB,WAAW;AACf;;AAEA;IACI,yBAAyB;IACzB,qBAAqB;IACrB,mBAAmB;IACnB,cAAc;IACd,iBAAiB;IACjB,gBAAgB;IAChB,gBAAgB;IAChB,2BAA2B;AAC/B;;AAEA;IACI,cAAc;IACd,qBAAqB;AACzB;;AAEA;IACI,cAAc;IACd,gBAAgB;AACpB;;AAEA;IACI,qBAAqB;AACzB;;AAEA,qBAAqB;AACrB;IACI,qBAAqB;IACrB,mDAAmD;AACvD;;AAEA;IACI,UAAU;AACd;;AAEA;IACI,uBAAuB;AAC3B;;AAEA;IACI,kCAAkC;IAClC,kBAAkB;AACtB;;AAEA;IACI,kCAAkC;AACtC;;AAEA,2CAA2C;AAC3C;IACI,kBAAkB;IAClB,YAAY;AAChB;;AAEA;IACI,cAAc;AAClB;;AAEA,8DAA8D;AAC9D;IACI,oBAAoB;IACpB,kBAAkB;IAClB,UAAU;IACV,QAAQ;IACR,2BAA2B;IAC3B,mBAAmB;IACnB,YAAY;IACZ,qBAAqB;IACrB,kBAAkB;IAClB,iBAAiB;IACjB,mBAAmB;IACnB,YAAY;IACZ,gBAAgB;IAChB,aAAa;IACb,UAAU;IACV,kBAAkB;IAClB,mDAAmD;IACnD,oBAAoB;IACpB,yCAAyC;AAC7C;;AAEA;IACI,WAAW;IACX,kBAAkB;IAClB,UAAU;IACV,QAAQ;IACR,gCAAgC;IAChC,6BAA6B;IAC7B,2BAA2B;IAC3B,aAAa;IACb,UAAU;IACV,kBAAkB;IAClB,mDAAmD;AACvD;;AAEA;;IAEI,UAAU;IACV,mBAAmB;AACvB;;AAEA,+BAA+B;AAC/B;IACI;QACI,UAAU;QACV,WAAW;QACX,kBAAkB;QAClB,YAAY;IAChB;;IAEA;QACI,UAAU;QACV,WAAW;QACX,+BAA+B;QAC/B,+BAA+B;QAC/B,0BAA0B;IAC9B;AACJ;;AAEA,gDAAgD;AAChD;IACI,8BAA8B;IAC9B,oCAAoC;IACpC,6BAA6B;IAC7B,0BAA0B;IAC1B,2BAA2B;IAC3B,2BAA2B;IAC3B,2BAA2B;IAC3B,2BAA2B;AAC/B;;AAEA;IACI,2BAA2B;IAC3B,kFAAkF;IAClF,yBAAyB;AAC7B;;AAEA,gBAAgB;AAChB;IACI,8BAA8B;IAC9B,+BAA+B;IAC/B,6BAA6B;IAC7B,2BAA2B;IAC3B,yBAAyB;AAC7B;;AAEA,iCAAiC;AACjC;IACI,eAAe;IACf,eAAe,EAAE,iCAAiC;IAClD,gBAAgB;AACpB;;AAEA;IACI,gBAAgB;IAChB,iBAAiB;IACjB,kBAAkB;AACtB;;AAEA,kCAAkC;AAClC;IACI,eAAe;IACf,gBAAgB;IAChB,qBAAqB;IACrB,cAAc;AAClB;;AAEA,0BAA0B;AAC1B;IACI,eAAe;IACf,gBAAgB;IAChB,qBAAqB;IACrB,cAAc;IACd,gBAAgB;AACpB;;AAEA;IACI,iBAAiB;IACjB,gBAAgB;IAChB,yBAAyB;IACzB,cAAc;IACd,gBAAgB;AACpB;;AAEA;IACI,eAAe;IACf,gBAAgB;IAChB,qBAAqB;IACrB,cAAc;IACd,gBAAgB;AACpB;;AAEA;IACI,iBAAiB;IACjB,gBAAgB;IAChB,uBAAuB;IACvB,cAAc;IACd,gBAAgB;AACpB;;AAEA,6BAA6B;AAC7B;;IAEI,eAAe;IACf,gBAAgB;IAChB,qBAAqB;AACzB;;AAEA,+BAA+B;AAC/B;IACI,eAAe;IACf,gBAAgB;IAChB,oBAAoB;IACpB,cAAc;IACd,8BAA8B;IAC9B,4DAA4D;IAC5D,0BAA0B;IAC1B,kBAAkB;IAClB,cAAc;AAClB;;AAEA,wBAAwB;AACxB;;;IAGI,eAAe;IACf,WAAW;IACX,cAAc;IACd,eAAe;AACnB;;AAEA,mCAAmC;AACnC;IACI;;QAEI,cAAc;QACd,iBAAiB;QACjB,kBAAkB;IACtB;AACJ;;AAEA;IACI;QACI,aAAa;IACjB;;IAEA;QACI,aAAa;IACjB;AACJ","sourcesContent":["/* Transformers-specific styling additions */\n\n/* Code comparison layout */\n.code-compare {\n display: grid;\n grid-template-columns: 1fr 1fr;\n gap: 1.5rem;\n margin: 2rem 0;\n align-items: start;\n}\n\n.code-compare .code-column {\n background: #ffffff;\n border: 1px solid #e2e8f0;\n border-radius: 8px;\n overflow: hidden;\n box-shadow: 0 1px 3px rgba(0, 0, 0, 0.1);\n}\n\n.code-compare .code-header {\n background: #f8f9fa;\n padding: 0.75rem 1rem;\n font-weight: 600;\n color: #495057;\n border-bottom: 1px solid #e2e8f0;\n font-size: 0.9em;\n}\n\n.code-compare pre {\n margin: 0;\n padding: 1rem;\n background: #ffffff;\n overflow-x: auto;\n font-size: 0.85em;\n line-height: 1.4;\n}\n\n.code-compare pre code {\n color: #374151;\n}\n\n/* Mobile responsiveness for code comparison */\n@media (max-width: 768px) {\n .code-compare {\n grid-template-columns: 1fr;\n gap: 1rem;\n }\n}\n\n/* Tenet styling - special highlighting for design principles */\n.tenet-list {\n margin: 3rem 0;\n}\n\n.tenet-list ol {\n counter-reset: tenet-counter -1; /* Start from 0 */\n list-style: none;\n padding-left: 0;\n display: flex;\n flex-direction: column;\n gap: 2rem;\n}\n\n.tenet-list li.tenet {\n counter-increment: tenet-counter;\n background: linear-gradient(135deg, #ffffff 0%, #f8f9fa 100%);\n border: 2px solid transparent;\n border-radius: 16px;\n padding: 2rem 2rem 2rem 4rem;\n margin: 0;\n position: relative;\n box-shadow: 0 8px 25px rgba(0, 0, 0, 0.08);\n transition: all 0.3s ease;\n cursor: pointer;\n}\n\n.tenet-list li.tenet:hover {\n transform: translateY(-8px) scale(1.02);\n box-shadow: 0 20px 50px rgba(0, 0, 0, 0.25);\n border-color: rgba(0, 123, 255, 0.5);\n background: linear-gradient(135deg, #ffffff 0%, #f0f8ff 100%);\n}\n\n/* Colorful numbering system */\n.tenet-list li.tenet:nth-child(1):before { background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); }\n.tenet-list li.tenet:nth-child(2):before { background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%); }\n.tenet-list li.tenet:nth-child(3):before { background: linear-gradient(135deg, #4facfe 0%, #00f2fe 100%); }\n.tenet-list li.tenet:nth-child(4):before { background: linear-gradient(135deg, #43e97b 0%, #38f9d7 100%); }\n.tenet-list li.tenet:nth-child(5):before { background: linear-gradient(135deg, #fa709a 0%, #fee140 100%); }\n.tenet-list li.tenet:nth-child(6):before { background: linear-gradient(135deg, #a8edea 0%, #fed6e3 100%); }\n.tenet-list li.tenet:nth-child(7):before { background: linear-gradient(135deg, #ff9a9e 0%, #fecfef 100%); }\n.tenet-list li.tenet:nth-child(8):before { background: linear-gradient(135deg, #a18cd1 0%, #fbc2eb 100%); }\n.tenet-list li.tenet:nth-child(9):before { background: linear-gradient(135deg, #ffecd2 0%, #fcb69f 100%); }\n\n.tenet-list li.tenet:before {\n content: counter(tenet-counter);\n position: absolute;\n top: -12px;\n left: -12px;\n color: white;\n width: 48px;\n height: 48px;\n border-radius: 50%;\n display: flex;\n align-items: center;\n justify-content: center;\n font-size: 1.2em;\n font-weight: bold;\n box-shadow: 0 4px 12px rgba(0, 0, 0, 0.15);\n border: 3px solid white;\n}\n\n.tenet-list li.tenet strong {\n color: #1a202c;\n font-size: 1.1em;\n display: block;\n margin-bottom: 0.5rem;\n}\n\n.tenet-list li.tenet em {\n color: #4a5568;\n font-size: 0.95em;\n font-style: italic;\n display: block;\n margin-top: 0.75rem;\n padding: 1rem;\n background: rgba(0, 0, 0, 0.03);\n border-radius: 8px;\n border-left: 3px solid #e2e8f0;\n}\n\n.tenet-list li.tenet p {\n color: #2d3748;\n line-height: 1.6;\n margin: 0.5rem 0;\n}\n\n/* Add a subtle pulse animation for the numbers */\n@keyframes pulse-glow {\n 0% { box-shadow: 0 4px 12px rgba(0, 0, 0, 0.15); }\n 50% { box-shadow: 0 4px 20px rgba(0, 0, 0, 0.25); }\n 100% { box-shadow: 0 4px 12px rgba(0, 0, 0, 0.15); }\n}\n\n.tenet-list li.tenet:hover:before {\n animation: pulse-glow 2s ease-in-out infinite;\n}\n\n/* Interactive component styling */\n.interactive-demo {\n border: 1px solid #e2e8f0;\n border-radius: 12px;\n background: #ffffff;\n margin: 2rem 0;\n overflow: hidden;\n box-shadow: 0 4px 6px rgba(0, 0, 0, 0.07);\n}\n\n.interactive-demo .demo-header {\n background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);\n color: white;\n padding: 1rem 1.5rem;\n font-weight: 600;\n}\n\n.interactive-demo .demo-content {\n padding: 1.5rem;\n}\n\n.interactive-demo .demo-footer {\n background: #f8f9fa;\n padding: 1rem 1.5rem;\n border-top: 1px solid #e2e8f0;\n color: #6c757d;\n font-size: 0.9em;\n}\n\n/* Button styling for interactive elements */\n.btn-primary {\n background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);\n border: none;\n color: white;\n padding: 0.75rem 1.5rem;\n border-radius: 6px;\n font-weight: 500;\n cursor: pointer;\n transition: transform 0.2s, box-shadow 0.2s;\n}\n\n.btn-primary:hover {\n transform: translateY(-1px);\n box-shadow: 0 4px 12px rgba(102, 126, 234, 0.3);\n}\n\n.btn-primary:disabled {\n opacity: 0.6;\n cursor: not-allowed;\n transform: none;\n box-shadow: none;\n}\n\n/* Terminal styling */\n.terminal-container {\n background: #1a202c;\n border-radius: 8px;\n padding: 1rem;\n color: #e2e8f0;\n font-family: 'Monaco', 'Menlo', 'Ubuntu Mono', monospace;\n font-size: 0.9em;\n}\n\n.terminal-input {\n background: #2d3748;\n border: 1px solid #4a5568;\n color: #e2e8f0;\n padding: 0.5rem;\n border-radius: 4px;\n width: 100%;\n font-family: inherit;\n}\n\n.terminal-output {\n background: #0a0e1a;\n padding: 1rem;\n border-radius: 4px;\n white-space: pre-wrap;\n word-break: break-all;\n min-height: 100px;\n max-height: 300px;\n overflow-y: auto;\n}\n\n/* Attention visualization styling */\n.attention-matrix {\n font-family: monospace;\n font-size: 0.8em;\n border-collapse: collapse;\n margin: 1rem 0;\n}\n\n.attention-matrix td {\n border: 1px solid #ddd;\n padding: 4px 8px;\n text-align: center;\n min-width: 50px;\n}\n\n/* Memory chart styling */\n.memory-chart-container {\n background: #f8f9fa;\n border: 2px solid #e9ecef;\n border-radius: 8px;\n padding: 1rem;\n margin: 1rem 0;\n}\n\n/* Image styling improvements */\nimg {\n max-width: 100%;\n height: auto;\n border-radius: 8px;\n box-shadow: 0 4px 12px rgba(0, 0, 0, 0.1);\n margin: 1.5rem 0;\n}\n\n/* Table of contents styling - Fixed positioning like ultrascale */\n@media (min-width: 1200px) {\n d-article {\n overflow: visible !important;\n }\n \n d-contents {\n align-self: start !important;\n background: white !important;\n grid-column-start: 1 !important;\n grid-column-end: 4 !important;\n grid-row: auto / span 6 !important;\n justify-self: end !important;\n margin-top: 0em !important;\n padding-right: 3em !important;\n padding-left: 2em !important;\n position: -webkit-sticky !important; /* For Safari */\n position: sticky !important;\n top: 10px !important;\n overflow-y: auto !important;\n height: calc(100vh - 40px) !important;\n scrollbar-width: none !important;\n transition: max-height 0.3s ease-out !important;\n z-index: -100 !important;\n display: block !important;\n visibility: visible !important;\n }\n}\n\n@media (max-width: 1199px) {\n d-contents {\n display: none !important;\n background: white !important;\n justify-self: start !important;\n align-self: start !important;\n padding-bottom: 0.5em !important;\n margin-bottom: 1em !important;\n padding-left: 0.25em !important;\n border-bottom: 1px solid rgba(0, 0, 0, 0.1) !important;\n overflow-y: scroll !important;\n height: calc(100vh - 40px) !important;\n scrollbar-width: none !important;\n z-index: -100 !important;\n }\n}\n\n/* Force TOC to be visible and override distill defaults */\nd-contents {\n display: block !important;\n visibility: visible !important;\n opacity: 1 !important;\n}\n\n/* TOC Navigation styling */\nd-contents .toc-header {\n margin-bottom: 1.5rem;\n border-bottom: 2px solid #007bff;\n padding-bottom: 0.5rem;\n}\n\nd-contents .toc-title {\n font-weight: bold;\n font-size: 1.2em;\n color: #333;\n}\n\nd-contents nav a {\n color: rgba(0, 0, 0, 0.7);\n text-decoration: none;\n border-bottom: none;\n display: block;\n padding: 0.3rem 0;\n font-size: 0.9em;\n line-height: 1.4;\n transition: color 0.2s ease;\n}\n\nd-contents nav a:hover {\n color: #007bff;\n text-decoration: none;\n}\n\nd-contents nav a.active {\n color: #007bff;\n font-weight: 600;\n}\n\nd-contents nav div {\n margin-bottom: 0.2rem;\n}\n\n/* Smooth scrollbar */\nd-contents {\n scrollbar-width: thin;\n scrollbar-color: rgba(0, 123, 255, 0.3) transparent;\n}\n\nd-contents::-webkit-scrollbar {\n width: 6px;\n}\n\nd-contents::-webkit-scrollbar-track {\n background: transparent;\n}\n\nd-contents::-webkit-scrollbar-thumb {\n background: rgba(0, 123, 255, 0.3);\n border-radius: 3px;\n}\n\nd-contents::-webkit-scrollbar-thumb:hover {\n background: rgba(0, 123, 255, 0.5);\n}\n\n/* Custom tooltip styling for tenet links */\nd-contents nav a[title] {\n position: relative;\n cursor: help;\n}\n\nd-contents nav a[title]:hover {\n color: #667eea;\n}\n\n/* Enhanced tooltip using CSS (fallback for title attribute) */\nd-contents nav a[title]:after {\n content: attr(title);\n position: absolute;\n left: 100%;\n top: 50%;\n transform: translateY(-50%);\n background: #1a202c;\n color: white;\n padding: 0.75rem 1rem;\n border-radius: 8px;\n font-size: 0.85em;\n white-space: normal;\n width: 300px;\n line-height: 1.4;\n z-index: 1001;\n opacity: 0;\n visibility: hidden;\n transition: opacity 0.3s ease, visibility 0.3s ease;\n pointer-events: none;\n box-shadow: 0 4px 12px rgba(0, 0, 0, 0.2);\n}\n\nd-contents nav a[title]:before {\n content: '';\n position: absolute;\n left: 100%;\n top: 50%;\n transform: translate(-8px, -50%);\n border: 8px solid transparent;\n border-right-color: #1a202c;\n z-index: 1002;\n opacity: 0;\n visibility: hidden;\n transition: opacity 0.3s ease, visibility 0.3s ease;\n}\n\nd-contents nav a[title]:hover:after,\nd-contents nav a[title]:hover:before {\n opacity: 1;\n visibility: visible;\n}\n\n/* Adjust for smaller screens */\n@media (max-width: 1400px) {\n d-contents nav a[title]:after {\n left: auto;\n right: 100%;\n margin-right: 1rem;\n width: 250px;\n }\n \n d-contents nav a[title]:before {\n left: auto;\n right: 100%;\n transform: translate(8px, -50%);\n border-right-color: transparent;\n border-left-color: #1a202c;\n }\n}\n\n/* Improve code syntax highlighting with Prism */\npre[class*=\"language-\"] {\n background: #f8f9fa !important;\n border: 1px solid #e9ecef !important;\n border-radius: 8px !important;\n padding: 1.5rem !important;\n margin: 1.5rem 0 !important;\n overflow-x: auto !important;\n font-size: 0.9em !important;\n line-height: 1.5 !important;\n}\n\ncode[class*=\"language-\"] {\n background: none !important;\n font-family: 'Monaco', 'Menlo', 'Ubuntu Mono', 'Courier New', monospace !important;\n color: #383a42 !important;\n}\n\n/* Inline code */\np code, li code {\n background: #f1f3f4 !important;\n padding: 0.2em 0.4em !important;\n border-radius: 3px !important;\n font-size: 0.9em !important;\n color: #d73a49 !important;\n}\n\n/* Distill article improvements */\nd-article {\n max-width: none;\n font-size: 18px; /* Increased from default ~16px */\n line-height: 1.7;\n}\n\nd-article > * {\n max-width: 900px;\n margin-left: auto;\n margin-right: auto;\n}\n\n/* Improve paragraph readability */\nd-article p {\n font-size: 18px;\n line-height: 1.8;\n margin-bottom: 1.5rem;\n color: #2d3748;\n}\n\n/* Improve heading sizes */\nd-article h1 {\n font-size: 3rem;\n line-height: 1.2;\n margin: 3rem 0 2rem 0;\n color: #1a202c;\n font-weight: 700;\n}\n\nd-article h2 {\n font-size: 2.5rem;\n line-height: 1.3;\n margin: 2.5rem 0 1.5rem 0;\n color: #1a202c;\n font-weight: 650;\n}\n\nd-article h3 {\n font-size: 2rem;\n line-height: 1.4;\n margin: 2rem 0 1rem 0;\n color: #1a202c;\n font-weight: 600;\n}\n\nd-article h4 {\n font-size: 1.5rem;\n line-height: 1.4;\n margin: 1.5rem 0 1rem 0;\n color: #2d3748;\n font-weight: 600;\n}\n\n/* Improve list readability */\nd-article ul li,\nd-article ol li {\n font-size: 18px;\n line-height: 1.7;\n margin-bottom: 0.5rem;\n}\n\n/* Improve blockquote styling */\nd-article blockquote {\n font-size: 19px;\n line-height: 1.8;\n padding: 1.5rem 2rem;\n margin: 2rem 0;\n border-left: 4px solid #667eea;\n background: linear-gradient(135deg, #f8f9fa 0%, #e9ecef 50%);\n border-radius: 0 8px 8px 0;\n font-style: italic;\n color: #4a5568;\n}\n\n/* Full width elements */\nd-article .code-compare,\nd-article .interactive-demo,\nd-article .memory-chart-container {\n max-width: none;\n width: 100%;\n margin-left: 0;\n margin-right: 0;\n}\n\n/* Responsive design improvements */\n@media (max-width: 1200px) {\n d-article .code-compare,\n d-article .interactive-demo {\n max-width: 95%;\n margin-left: auto;\n margin-right: auto;\n }\n}\n\n@media (max-width: 768px) {\n .tenet-list li.tenet {\n padding: 1rem;\n }\n \n .interactive-demo .demo-content {\n padding: 1rem;\n }\n}"],"sourceRoot":""}]);
|
| 1727 |
// Exports
|
| 1728 |
/* harmony default export */ const __WEBPACK_DEFAULT_EXPORT__ = (___CSS_LOADER_EXPORT___);
|
| 1729 |
|
|
|
|
| 1628 |
}
|
| 1629 |
|
| 1630 |
d-article > * {
|
| 1631 |
+
max-width: 1100px; /* Increased from 900px for more space */
|
| 1632 |
margin-left: auto;
|
| 1633 |
margin-right: auto;
|
| 1634 |
}
|
| 1635 |
|
| 1636 |
+
/* Make content even wider on large screens when TOC is present */
|
| 1637 |
+
@media (min-width: 1400px) {
|
| 1638 |
+
d-article > * {
|
| 1639 |
+
max-width: 1300px;
|
| 1640 |
+
}
|
| 1641 |
+
}
|
| 1642 |
+
|
| 1643 |
/* Improve paragraph readability */
|
| 1644 |
d-article p {
|
| 1645 |
font-size: 18px;
|
|
|
|
| 1689 |
margin-bottom: 0.5rem;
|
| 1690 |
}
|
| 1691 |
|
| 1692 |
+
/* Enhanced tenet reference styling with tooltips */
|
| 1693 |
+
a[href^="#source-of-truth"],
|
| 1694 |
+
a[href^="#one-model-one-file"],
|
| 1695 |
+
a[href^="#code-is-product"],
|
| 1696 |
+
a[href^="#standardize-dont-abstract"],
|
| 1697 |
+
a[href^="#do-repeat-yourself"],
|
| 1698 |
+
a[href^="#minimal-user-api"],
|
| 1699 |
+
a[href^="#backwards-compatibility"],
|
| 1700 |
+
a[href^="#consistent-public-surface"],
|
| 1701 |
+
a[href^="#modular-toolbox"] {
|
| 1702 |
+
position: relative;
|
| 1703 |
+
color: #667eea;
|
| 1704 |
+
font-weight: 600;
|
| 1705 |
+
text-decoration: underline;
|
| 1706 |
+
text-decoration-color: rgba(102, 126, 234, 0.3);
|
| 1707 |
+
transition: all 0.3s ease;
|
| 1708 |
+
cursor: help;
|
| 1709 |
+
}
|
| 1710 |
+
|
| 1711 |
+
a[href^="#source-of-truth"]:hover,
|
| 1712 |
+
a[href^="#one-model-one-file"]:hover,
|
| 1713 |
+
a[href^="#code-is-product"]:hover,
|
| 1714 |
+
a[href^="#standardize-dont-abstract"]:hover,
|
| 1715 |
+
a[href^="#do-repeat-yourself"]:hover,
|
| 1716 |
+
a[href^="#minimal-user-api"]:hover,
|
| 1717 |
+
a[href^="#backwards-compatibility"]:hover,
|
| 1718 |
+
a[href^="#consistent-public-surface"]:hover,
|
| 1719 |
+
a[href^="#modular-toolbox"]:hover {
|
| 1720 |
+
color: #4c51bf;
|
| 1721 |
+
text-decoration-color: #4c51bf;
|
| 1722 |
+
background: rgba(102, 126, 234, 0.1);
|
| 1723 |
+
padding: 2px 4px;
|
| 1724 |
+
border-radius: 4px;
|
| 1725 |
+
}
|
| 1726 |
+
|
| 1727 |
+
/* Tooltip content for each tenet */
|
| 1728 |
+
a[href^="#source-of-truth"]:after { content: "We should be a source of truth for all model definitions. Model implementations should be reliable, reproducible, and faithful to the original performances."; }
|
| 1729 |
+
a[href^="#one-model-one-file"]:after { content: "All inference (and most of training, loss is separate, not a part of model) logic visible, top‑to‑bottom."; }
|
| 1730 |
+
a[href^="#code-is-product"]:after { content: "Optimize for reading, diffing, and tweaking, our users are power users. Variables can be explicit, full words, even several words, readability is primordial."; }
|
| 1731 |
+
a[href^="#standardize-dont-abstract"]:after { content: "If it's model behavior, keep it in the file; abstractions only for generic infra."; }
|
| 1732 |
+
a[href^="#do-repeat-yourself"]:after { content: "Copy when it helps users; keep successors in sync without centralizing behavior."; }
|
| 1733 |
+
a[href^="#minimal-user-api"]:after { content: "Config, model, preprocessing; from_pretrained, save_pretrained, push_to_hub. We want the least amount of codepaths."; }
|
| 1734 |
+
a[href^="#backwards-compatibility"]:after { content: "Evolve by additive standardization, never break public APIs."; }
|
| 1735 |
+
a[href^="#consistent-public-surface"]:after { content: "Same argument names, same outputs, hidden states and attentions exposed."; }
|
| 1736 |
+
a[href^="#modular-toolbox"]:after { content: "Provide tools and utilities, but don't force users into a rigid framework."; }
|
| 1737 |
+
|
| 1738 |
+
/* Universal tooltip styling for tenet references */
|
| 1739 |
+
a[href^="#source-of-truth"]:after,
|
| 1740 |
+
a[href^="#one-model-one-file"]:after,
|
| 1741 |
+
a[href^="#code-is-product"]:after,
|
| 1742 |
+
a[href^="#standardize-dont-abstract"]:after,
|
| 1743 |
+
a[href^="#do-repeat-yourself"]:after,
|
| 1744 |
+
a[href^="#minimal-user-api"]:after,
|
| 1745 |
+
a[href^="#backwards-compatibility"]:after,
|
| 1746 |
+
a[href^="#consistent-public-surface"]:after,
|
| 1747 |
+
a[href^="#modular-toolbox"]:after {
|
| 1748 |
+
position: absolute;
|
| 1749 |
+
bottom: 100%;
|
| 1750 |
+
left: 50%;
|
| 1751 |
+
transform: translateX(-50%);
|
| 1752 |
+
background: #1a202c;
|
| 1753 |
+
color: white;
|
| 1754 |
+
padding: 0.75rem 1rem;
|
| 1755 |
+
border-radius: 8px;
|
| 1756 |
+
font-size: 0.85em;
|
| 1757 |
+
font-weight: 400;
|
| 1758 |
+
white-space: normal;
|
| 1759 |
+
width: 320px;
|
| 1760 |
+
line-height: 1.4;
|
| 1761 |
+
z-index: 1001;
|
| 1762 |
+
opacity: 0;
|
| 1763 |
+
visibility: hidden;
|
| 1764 |
+
transition: opacity 0.3s ease, visibility 0.3s ease;
|
| 1765 |
+
pointer-events: none;
|
| 1766 |
+
box-shadow: 0 4px 12px rgba(0, 0, 0, 0.2);
|
| 1767 |
+
margin-bottom: 8px;
|
| 1768 |
+
}
|
| 1769 |
+
|
| 1770 |
+
/* Tooltip arrows */
|
| 1771 |
+
a[href^="#source-of-truth"]:before,
|
| 1772 |
+
a[href^="#one-model-one-file"]:before,
|
| 1773 |
+
a[href^="#code-is-product"]:before,
|
| 1774 |
+
a[href^="#standardize-dont-abstract"]:before,
|
| 1775 |
+
a[href^="#do-repeat-yourself"]:before,
|
| 1776 |
+
a[href^="#minimal-user-api"]:before,
|
| 1777 |
+
a[href^="#backwards-compatibility"]:before,
|
| 1778 |
+
a[href^="#consistent-public-surface"]:before,
|
| 1779 |
+
a[href^="#modular-toolbox"]:before {
|
| 1780 |
+
content: '';
|
| 1781 |
+
position: absolute;
|
| 1782 |
+
bottom: 100%;
|
| 1783 |
+
left: 50%;
|
| 1784 |
+
transform: translateX(-50%);
|
| 1785 |
+
border: 8px solid transparent;
|
| 1786 |
+
border-top-color: #1a202c;
|
| 1787 |
+
z-index: 1002;
|
| 1788 |
+
opacity: 0;
|
| 1789 |
+
visibility: hidden;
|
| 1790 |
+
transition: opacity 0.3s ease, visibility 0.3s ease;
|
| 1791 |
+
}
|
| 1792 |
+
|
| 1793 |
+
/* Show tooltips on hover */
|
| 1794 |
+
a[href^="#source-of-truth"]:hover:after,
|
| 1795 |
+
a[href^="#one-model-one-file"]:hover:after,
|
| 1796 |
+
a[href^="#code-is-product"]:hover:after,
|
| 1797 |
+
a[href^="#standardize-dont-abstract"]:hover:after,
|
| 1798 |
+
a[href^="#do-repeat-yourself"]:hover:after,
|
| 1799 |
+
a[href^="#minimal-user-api"]:hover:after,
|
| 1800 |
+
a[href^="#backwards-compatibility"]:hover:after,
|
| 1801 |
+
a[href^="#consistent-public-surface"]:hover:after,
|
| 1802 |
+
a[href^="#modular-toolbox"]:hover:after,
|
| 1803 |
+
a[href^="#source-of-truth"]:hover:before,
|
| 1804 |
+
a[href^="#one-model-one-file"]:hover:before,
|
| 1805 |
+
a[href^="#code-is-product"]:hover:before,
|
| 1806 |
+
a[href^="#standardize-dont-abstract"]:hover:before,
|
| 1807 |
+
a[href^="#do-repeat-yourself"]:hover:before,
|
| 1808 |
+
a[href^="#minimal-user-api"]:hover:before,
|
| 1809 |
+
a[href^="#backwards-compatibility"]:hover:before,
|
| 1810 |
+
a[href^="#consistent-public-surface"]:hover:before,
|
| 1811 |
+
a[href^="#modular-toolbox"]:hover:before {
|
| 1812 |
+
opacity: 1;
|
| 1813 |
+
visibility: visible;
|
| 1814 |
+
}
|
| 1815 |
+
|
| 1816 |
/* Improve blockquote styling */
|
| 1817 |
d-article blockquote {
|
| 1818 |
font-size: 19px;
|
|
|
|
| 1854 |
.interactive-demo .demo-content {
|
| 1855 |
padding: 1rem;
|
| 1856 |
}
|
| 1857 |
+
}`, "",{"version":3,"sources":["webpack://./src/transformers-custom.css"],"names":[],"mappings":"AAAA,4CAA4C;;AAE5C,2BAA2B;AAC3B;IACI,aAAa;IACb,8BAA8B;IAC9B,WAAW;IACX,cAAc;IACd,kBAAkB;AACtB;;AAEA;IACI,mBAAmB;IACnB,yBAAyB;IACzB,kBAAkB;IAClB,gBAAgB;IAChB,wCAAwC;AAC5C;;AAEA;IACI,mBAAmB;IACnB,qBAAqB;IACrB,gBAAgB;IAChB,cAAc;IACd,gCAAgC;IAChC,gBAAgB;AACpB;;AAEA;IACI,SAAS;IACT,aAAa;IACb,mBAAmB;IACnB,gBAAgB;IAChB,iBAAiB;IACjB,gBAAgB;AACpB;;AAEA;IACI,cAAc;AAClB;;AAEA,8CAA8C;AAC9C;IACI;QACI,0BAA0B;QAC1B,SAAS;IACb;AACJ;;AAEA,+DAA+D;AAC/D;IACI,cAAc;AAClB;;AAEA;IACI,+BAA+B,EAAE,iBAAiB;IAClD,gBAAgB;IAChB,eAAe;IACf,aAAa;IACb,sBAAsB;IACtB,SAAS;AACb;;AAEA;IACI,gCAAgC;IAChC,6DAA6D;IAC7D,6BAA6B;IAC7B,mBAAmB;IACnB,4BAA4B;IAC5B,SAAS;IACT,kBAAkB;IAClB,0CAA0C;IAC1C,yBAAyB;IACzB,eAAe;AACnB;;AAEA;IACI,uCAAuC;IACvC,2CAA2C;IAC3C,oCAAoC;IACpC,6DAA6D;AACjE;;AAEA,8BAA8B;AAC9B,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;;AAE1G;IACI,+BAA+B;IAC/B,kBAAkB;IAClB,UAAU;IACV,WAAW;IACX,YAAY;IACZ,WAAW;IACX,YAAY;IACZ,kBAAkB;IAClB,aAAa;IACb,mBAAmB;IACnB,uBAAuB;IACvB,gBAAgB;IAChB,iBAAiB;IACjB,0CAA0C;IAC1C,uBAAuB;AAC3B;;AAEA;IACI,cAAc;IACd,gBAAgB;IAChB,cAAc;IACd,qBAAqB;AACzB;;AAEA;IACI,cAAc;IACd,iBAAiB;IACjB,kBAAkB;IAClB,cAAc;IACd,mBAAmB;IACnB,aAAa;IACb,+BAA+B;IAC/B,kBAAkB;IAClB,8BAA8B;AAClC;;AAEA;IACI,cAAc;IACd,gBAAgB;IAChB,gBAAgB;AACpB;;AAEA,iDAAiD;AACjD;IACI,KAAK,0CAA0C,EAAE;IACjD,MAAM,0CAA0C,EAAE;IAClD,OAAO,0CAA0C,EAAE;AACvD;;AAEA;IACI,6CAA6C;AACjD;;AAEA,kCAAkC;AAClC;IACI,yBAAyB;IACzB,mBAAmB;IACnB,mBAAmB;IACnB,cAAc;IACd,gBAAgB;IAChB,yCAAyC;AAC7C;;AAEA;IACI,6DAA6D;IAC7D,YAAY;IACZ,oBAAoB;IACpB,gBAAgB;AACpB;;AAEA;IACI,eAAe;AACnB;;AAEA;IACI,mBAAmB;IACnB,oBAAoB;IACpB,6BAA6B;IAC7B,cAAc;IACd,gBAAgB;AACpB;;AAEA,4CAA4C;AAC5C;IACI,6DAA6D;IAC7D,YAAY;IACZ,YAAY;IACZ,uBAAuB;IACvB,kBAAkB;IAClB,gBAAgB;IAChB,eAAe;IACf,2CAA2C;AAC/C;;AAEA;IACI,2BAA2B;IAC3B,+CAA+C;AACnD;;AAEA;IACI,YAAY;IACZ,mBAAmB;IACnB,eAAe;IACf,gBAAgB;AACpB;;AAEA,qBAAqB;AACrB;IACI,mBAAmB;IACnB,kBAAkB;IAClB,aAAa;IACb,cAAc;IACd,wDAAwD;IACxD,gBAAgB;AACpB;;AAEA;IACI,mBAAmB;IACnB,yBAAyB;IACzB,cAAc;IACd,eAAe;IACf,kBAAkB;IAClB,WAAW;IACX,oBAAoB;AACxB;;AAEA;IACI,mBAAmB;IACnB,aAAa;IACb,kBAAkB;IAClB,qBAAqB;IACrB,qBAAqB;IACrB,iBAAiB;IACjB,iBAAiB;IACjB,gBAAgB;AACpB;;AAEA,oCAAoC;AACpC;IACI,sBAAsB;IACtB,gBAAgB;IAChB,yBAAyB;IACzB,cAAc;AAClB;;AAEA;IACI,sBAAsB;IACtB,gBAAgB;IAChB,kBAAkB;IAClB,eAAe;AACnB;;AAEA,yBAAyB;AACzB;IACI,mBAAmB;IACnB,yBAAyB;IACzB,kBAAkB;IAClB,aAAa;IACb,cAAc;AAClB;;AAEA,+BAA+B;AAC/B;IACI,eAAe;IACf,YAAY;IACZ,kBAAkB;IAClB,yCAAyC;IACzC,gBAAgB;AACpB;;AAEA,kEAAkE;AAClE;IACI;QACI,4BAA4B;IAChC;;IAEA;QACI,4BAA4B;QAC5B,4BAA4B;QAC5B,+BAA+B;QAC/B,6BAA6B;QAC7B,kCAAkC;QAClC,4BAA4B;QAC5B,0BAA0B;QAC1B,6BAA6B;QAC7B,4BAA4B;QAC5B,mCAAmC,EAAE,eAAe;QACpD,2BAA2B;QAC3B,oBAAoB;QACpB,2BAA2B;QAC3B,qCAAqC;QACrC,gCAAgC;QAChC,+CAA+C;QAC/C,wBAAwB;QACxB,yBAAyB;QACzB,8BAA8B;IAClC;AACJ;;AAEA;IACI;QACI,wBAAwB;QACxB,4BAA4B;QAC5B,8BAA8B;QAC9B,4BAA4B;QAC5B,gCAAgC;QAChC,6BAA6B;QAC7B,+BAA+B;QAC/B,sDAAsD;QACtD,6BAA6B;QAC7B,qCAAqC;QACrC,gCAAgC;QAChC,wBAAwB;IAC5B;AACJ;;AAEA,0DAA0D;AAC1D;IACI,yBAAyB;IACzB,8BAA8B;IAC9B,qBAAqB;AACzB;;AAEA,2BAA2B;AAC3B;IACI,qBAAqB;IACrB,gCAAgC;IAChC,sBAAsB;AAC1B;;AAEA;IACI,iBAAiB;IACjB,gBAAgB;IAChB,WAAW;AACf;;AAEA;IACI,yBAAyB;IACzB,qBAAqB;IACrB,mBAAmB;IACnB,cAAc;IACd,iBAAiB;IACjB,gBAAgB;IAChB,gBAAgB;IAChB,2BAA2B;AAC/B;;AAEA;IACI,cAAc;IACd,qBAAqB;AACzB;;AAEA;IACI,cAAc;IACd,gBAAgB;AACpB;;AAEA;IACI,qBAAqB;AACzB;;AAEA,qBAAqB;AACrB;IACI,qBAAqB;IACrB,mDAAmD;AACvD;;AAEA;IACI,UAAU;AACd;;AAEA;IACI,uBAAuB;AAC3B;;AAEA;IACI,kCAAkC;IAClC,kBAAkB;AACtB;;AAEA;IACI,kCAAkC;AACtC;;AAEA,2CAA2C;AAC3C;IACI,kBAAkB;IAClB,YAAY;AAChB;;AAEA;IACI,cAAc;AAClB;;AAEA,8DAA8D;AAC9D;IACI,oBAAoB;IACpB,kBAAkB;IAClB,UAAU;IACV,QAAQ;IACR,2BAA2B;IAC3B,mBAAmB;IACnB,YAAY;IACZ,qBAAqB;IACrB,kBAAkB;IAClB,iBAAiB;IACjB,mBAAmB;IACnB,YAAY;IACZ,gBAAgB;IAChB,aAAa;IACb,UAAU;IACV,kBAAkB;IAClB,mDAAmD;IACnD,oBAAoB;IACpB,yCAAyC;AAC7C;;AAEA;IACI,WAAW;IACX,kBAAkB;IAClB,UAAU;IACV,QAAQ;IACR,gCAAgC;IAChC,6BAA6B;IAC7B,2BAA2B;IAC3B,aAAa;IACb,UAAU;IACV,kBAAkB;IAClB,mDAAmD;AACvD;;AAEA;;IAEI,UAAU;IACV,mBAAmB;AACvB;;AAEA,+BAA+B;AAC/B;IACI;QACI,UAAU;QACV,WAAW;QACX,kBAAkB;QAClB,YAAY;IAChB;;IAEA;QACI,UAAU;QACV,WAAW;QACX,+BAA+B;QAC/B,+BAA+B;QAC/B,0BAA0B;IAC9B;AACJ;;AAEA,gDAAgD;AAChD;IACI,8BAA8B;IAC9B,oCAAoC;IACpC,6BAA6B;IAC7B,0BAA0B;IAC1B,2BAA2B;IAC3B,2BAA2B;IAC3B,2BAA2B;IAC3B,2BAA2B;AAC/B;;AAEA;IACI,2BAA2B;IAC3B,kFAAkF;IAClF,yBAAyB;AAC7B;;AAEA,gBAAgB;AAChB;IACI,8BAA8B;IAC9B,+BAA+B;IAC/B,6BAA6B;IAC7B,2BAA2B;IAC3B,yBAAyB;AAC7B;;AAEA,iCAAiC;AACjC;IACI,eAAe;IACf,eAAe,EAAE,iCAAiC;IAClD,gBAAgB;AACpB;;AAEA;IACI,iBAAiB,EAAE,wCAAwC;IAC3D,iBAAiB;IACjB,kBAAkB;AACtB;;AAEA,iEAAiE;AACjE;IACI;QACI,iBAAiB;IACrB;AACJ;;AAEA,kCAAkC;AAClC;IACI,eAAe;IACf,gBAAgB;IAChB,qBAAqB;IACrB,cAAc;AAClB;;AAEA,0BAA0B;AAC1B;IACI,eAAe;IACf,gBAAgB;IAChB,qBAAqB;IACrB,cAAc;IACd,gBAAgB;AACpB;;AAEA;IACI,iBAAiB;IACjB,gBAAgB;IAChB,yBAAyB;IACzB,cAAc;IACd,gBAAgB;AACpB;;AAEA;IACI,eAAe;IACf,gBAAgB;IAChB,qBAAqB;IACrB,cAAc;IACd,gBAAgB;AACpB;;AAEA;IACI,iBAAiB;IACjB,gBAAgB;IAChB,uBAAuB;IACvB,cAAc;IACd,gBAAgB;AACpB;;AAEA,6BAA6B;AAC7B;;IAEI,eAAe;IACf,gBAAgB;IAChB,qBAAqB;AACzB;;AAEA,mDAAmD;AACnD;;;;;;;;;IASI,kBAAkB;IAClB,cAAc;IACd,gBAAgB;IAChB,0BAA0B;IAC1B,+CAA+C;IAC/C,yBAAyB;IACzB,YAAY;AAChB;;AAEA;;;;;;;;;IASI,cAAc;IACd,8BAA8B;IAC9B,oCAAoC;IACpC,gBAAgB;IAChB,kBAAkB;AACtB;;AAEA,mCAAmC;AACnC,oCAAoC,uKAAuK,EAAE;AAC7M,uCAAuC,oHAAoH,EAAE;AAC7J,oCAAoC,wKAAwK,EAAE;AAC9M,8CAA8C,4FAA4F,EAAE;AAC5I,uCAAuC,2FAA2F,EAAE;AACpI,qCAAqC,8HAA8H,EAAE;AACrK,4CAA4C,uEAAuE,EAAE;AACrH,8CAA8C,mFAAmF,EAAE;AACnI,oCAAoC,qFAAqF,EAAE;;AAE3H,mDAAmD;AACnD;;;;;;;;;IASI,kBAAkB;IAClB,YAAY;IACZ,SAAS;IACT,2BAA2B;IAC3B,mBAAmB;IACnB,YAAY;IACZ,qBAAqB;IACrB,kBAAkB;IAClB,iBAAiB;IACjB,gBAAgB;IAChB,mBAAmB;IACnB,YAAY;IACZ,gBAAgB;IAChB,aAAa;IACb,UAAU;IACV,kBAAkB;IAClB,mDAAmD;IACnD,oBAAoB;IACpB,yCAAyC;IACzC,kBAAkB;AACtB;;AAEA,mBAAmB;AACnB;;;;;;;;;IASI,WAAW;IACX,kBAAkB;IAClB,YAAY;IACZ,SAAS;IACT,2BAA2B;IAC3B,6BAA6B;IAC7B,yBAAyB;IACzB,aAAa;IACb,UAAU;IACV,kBAAkB;IAClB,mDAAmD;AACvD;;AAEA,2BAA2B;AAC3B;;;;;;;;;;;;;;;;;;IAkBI,UAAU;IACV,mBAAmB;AACvB;;AAEA,+BAA+B;AAC/B;IACI,eAAe;IACf,gBAAgB;IAChB,oBAAoB;IACpB,cAAc;IACd,8BAA8B;IAC9B,4DAA4D;IAC5D,0BAA0B;IAC1B,kBAAkB;IAClB,cAAc;AAClB;;AAEA,wBAAwB;AACxB;;;IAGI,eAAe;IACf,WAAW;IACX,cAAc;IACd,eAAe;AACnB;;AAEA,mCAAmC;AACnC;IACI;;QAEI,cAAc;QACd,iBAAiB;QACjB,kBAAkB;IACtB;AACJ;;AAEA;IACI;QACI,aAAa;IACjB;;IAEA;QACI,aAAa;IACjB;AACJ","sourcesContent":["/* Transformers-specific styling additions */\n\n/* Code comparison layout */\n.code-compare {\n display: grid;\n grid-template-columns: 1fr 1fr;\n gap: 1.5rem;\n margin: 2rem 0;\n align-items: start;\n}\n\n.code-compare .code-column {\n background: #ffffff;\n border: 1px solid #e2e8f0;\n border-radius: 8px;\n overflow: hidden;\n box-shadow: 0 1px 3px rgba(0, 0, 0, 0.1);\n}\n\n.code-compare .code-header {\n background: #f8f9fa;\n padding: 0.75rem 1rem;\n font-weight: 600;\n color: #495057;\n border-bottom: 1px solid #e2e8f0;\n font-size: 0.9em;\n}\n\n.code-compare pre {\n margin: 0;\n padding: 1rem;\n background: #ffffff;\n overflow-x: auto;\n font-size: 0.85em;\n line-height: 1.4;\n}\n\n.code-compare pre code {\n color: #374151;\n}\n\n/* Mobile responsiveness for code comparison */\n@media (max-width: 768px) {\n .code-compare {\n grid-template-columns: 1fr;\n gap: 1rem;\n }\n}\n\n/* Tenet styling - special highlighting for design principles */\n.tenet-list {\n margin: 3rem 0;\n}\n\n.tenet-list ol {\n counter-reset: tenet-counter -1; /* Start from 0 */\n list-style: none;\n padding-left: 0;\n display: flex;\n flex-direction: column;\n gap: 2rem;\n}\n\n.tenet-list li.tenet {\n counter-increment: tenet-counter;\n background: linear-gradient(135deg, #ffffff 0%, #f8f9fa 100%);\n border: 2px solid transparent;\n border-radius: 16px;\n padding: 2rem 2rem 2rem 4rem;\n margin: 0;\n position: relative;\n box-shadow: 0 8px 25px rgba(0, 0, 0, 0.08);\n transition: all 0.3s ease;\n cursor: pointer;\n}\n\n.tenet-list li.tenet:hover {\n transform: translateY(-8px) scale(1.02);\n box-shadow: 0 20px 50px rgba(0, 0, 0, 0.25);\n border-color: rgba(0, 123, 255, 0.5);\n background: linear-gradient(135deg, #ffffff 0%, #f0f8ff 100%);\n}\n\n/* Colorful numbering system */\n.tenet-list li.tenet:nth-child(1):before { background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); }\n.tenet-list li.tenet:nth-child(2):before { background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%); }\n.tenet-list li.tenet:nth-child(3):before { background: linear-gradient(135deg, #4facfe 0%, #00f2fe 100%); }\n.tenet-list li.tenet:nth-child(4):before { background: linear-gradient(135deg, #43e97b 0%, #38f9d7 100%); }\n.tenet-list li.tenet:nth-child(5):before { background: linear-gradient(135deg, #fa709a 0%, #fee140 100%); }\n.tenet-list li.tenet:nth-child(6):before { background: linear-gradient(135deg, #a8edea 0%, #fed6e3 100%); }\n.tenet-list li.tenet:nth-child(7):before { background: linear-gradient(135deg, #ff9a9e 0%, #fecfef 100%); }\n.tenet-list li.tenet:nth-child(8):before { background: linear-gradient(135deg, #a18cd1 0%, #fbc2eb 100%); }\n.tenet-list li.tenet:nth-child(9):before { background: linear-gradient(135deg, #ffecd2 0%, #fcb69f 100%); }\n\n.tenet-list li.tenet:before {\n content: counter(tenet-counter);\n position: absolute;\n top: -12px;\n left: -12px;\n color: white;\n width: 48px;\n height: 48px;\n border-radius: 50%;\n display: flex;\n align-items: center;\n justify-content: center;\n font-size: 1.2em;\n font-weight: bold;\n box-shadow: 0 4px 12px rgba(0, 0, 0, 0.15);\n border: 3px solid white;\n}\n\n.tenet-list li.tenet strong {\n color: #1a202c;\n font-size: 1.1em;\n display: block;\n margin-bottom: 0.5rem;\n}\n\n.tenet-list li.tenet em {\n color: #4a5568;\n font-size: 0.95em;\n font-style: italic;\n display: block;\n margin-top: 0.75rem;\n padding: 1rem;\n background: rgba(0, 0, 0, 0.03);\n border-radius: 8px;\n border-left: 3px solid #e2e8f0;\n}\n\n.tenet-list li.tenet p {\n color: #2d3748;\n line-height: 1.6;\n margin: 0.5rem 0;\n}\n\n/* Add a subtle pulse animation for the numbers */\n@keyframes pulse-glow {\n 0% { box-shadow: 0 4px 12px rgba(0, 0, 0, 0.15); }\n 50% { box-shadow: 0 4px 20px rgba(0, 0, 0, 0.25); }\n 100% { box-shadow: 0 4px 12px rgba(0, 0, 0, 0.15); }\n}\n\n.tenet-list li.tenet:hover:before {\n animation: pulse-glow 2s ease-in-out infinite;\n}\n\n/* Interactive component styling */\n.interactive-demo {\n border: 1px solid #e2e8f0;\n border-radius: 12px;\n background: #ffffff;\n margin: 2rem 0;\n overflow: hidden;\n box-shadow: 0 4px 6px rgba(0, 0, 0, 0.07);\n}\n\n.interactive-demo .demo-header {\n background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);\n color: white;\n padding: 1rem 1.5rem;\n font-weight: 600;\n}\n\n.interactive-demo .demo-content {\n padding: 1.5rem;\n}\n\n.interactive-demo .demo-footer {\n background: #f8f9fa;\n padding: 1rem 1.5rem;\n border-top: 1px solid #e2e8f0;\n color: #6c757d;\n font-size: 0.9em;\n}\n\n/* Button styling for interactive elements */\n.btn-primary {\n background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);\n border: none;\n color: white;\n padding: 0.75rem 1.5rem;\n border-radius: 6px;\n font-weight: 500;\n cursor: pointer;\n transition: transform 0.2s, box-shadow 0.2s;\n}\n\n.btn-primary:hover {\n transform: translateY(-1px);\n box-shadow: 0 4px 12px rgba(102, 126, 234, 0.3);\n}\n\n.btn-primary:disabled {\n opacity: 0.6;\n cursor: not-allowed;\n transform: none;\n box-shadow: none;\n}\n\n/* Terminal styling */\n.terminal-container {\n background: #1a202c;\n border-radius: 8px;\n padding: 1rem;\n color: #e2e8f0;\n font-family: 'Monaco', 'Menlo', 'Ubuntu Mono', monospace;\n font-size: 0.9em;\n}\n\n.terminal-input {\n background: #2d3748;\n border: 1px solid #4a5568;\n color: #e2e8f0;\n padding: 0.5rem;\n border-radius: 4px;\n width: 100%;\n font-family: inherit;\n}\n\n.terminal-output {\n background: #0a0e1a;\n padding: 1rem;\n border-radius: 4px;\n white-space: pre-wrap;\n word-break: break-all;\n min-height: 100px;\n max-height: 300px;\n overflow-y: auto;\n}\n\n/* Attention visualization styling */\n.attention-matrix {\n font-family: monospace;\n font-size: 0.8em;\n border-collapse: collapse;\n margin: 1rem 0;\n}\n\n.attention-matrix td {\n border: 1px solid #ddd;\n padding: 4px 8px;\n text-align: center;\n min-width: 50px;\n}\n\n/* Memory chart styling */\n.memory-chart-container {\n background: #f8f9fa;\n border: 2px solid #e9ecef;\n border-radius: 8px;\n padding: 1rem;\n margin: 1rem 0;\n}\n\n/* Image styling improvements */\nimg {\n max-width: 100%;\n height: auto;\n border-radius: 8px;\n box-shadow: 0 4px 12px rgba(0, 0, 0, 0.1);\n margin: 1.5rem 0;\n}\n\n/* Table of contents styling - Fixed positioning like ultrascale */\n@media (min-width: 1200px) {\n d-article {\n overflow: visible !important;\n }\n \n d-contents {\n align-self: start !important;\n background: white !important;\n grid-column-start: 1 !important;\n grid-column-end: 4 !important;\n grid-row: auto / span 6 !important;\n justify-self: end !important;\n margin-top: 0em !important;\n padding-right: 3em !important;\n padding-left: 2em !important;\n position: -webkit-sticky !important; /* For Safari */\n position: sticky !important;\n top: 10px !important;\n overflow-y: auto !important;\n height: calc(100vh - 40px) !important;\n scrollbar-width: none !important;\n transition: max-height 0.3s ease-out !important;\n z-index: -100 !important;\n display: block !important;\n visibility: visible !important;\n }\n}\n\n@media (max-width: 1199px) {\n d-contents {\n display: none !important;\n background: white !important;\n justify-self: start !important;\n align-self: start !important;\n padding-bottom: 0.5em !important;\n margin-bottom: 1em !important;\n padding-left: 0.25em !important;\n border-bottom: 1px solid rgba(0, 0, 0, 0.1) !important;\n overflow-y: scroll !important;\n height: calc(100vh - 40px) !important;\n scrollbar-width: none !important;\n z-index: -100 !important;\n }\n}\n\n/* Force TOC to be visible and override distill defaults */\nd-contents {\n display: block !important;\n visibility: visible !important;\n opacity: 1 !important;\n}\n\n/* TOC Navigation styling */\nd-contents .toc-header {\n margin-bottom: 1.5rem;\n border-bottom: 2px solid #007bff;\n padding-bottom: 0.5rem;\n}\n\nd-contents .toc-title {\n font-weight: bold;\n font-size: 1.2em;\n color: #333;\n}\n\nd-contents nav a {\n color: rgba(0, 0, 0, 0.7);\n text-decoration: none;\n border-bottom: none;\n display: block;\n padding: 0.3rem 0;\n font-size: 0.9em;\n line-height: 1.4;\n transition: color 0.2s ease;\n}\n\nd-contents nav a:hover {\n color: #007bff;\n text-decoration: none;\n}\n\nd-contents nav a.active {\n color: #007bff;\n font-weight: 600;\n}\n\nd-contents nav div {\n margin-bottom: 0.2rem;\n}\n\n/* Smooth scrollbar */\nd-contents {\n scrollbar-width: thin;\n scrollbar-color: rgba(0, 123, 255, 0.3) transparent;\n}\n\nd-contents::-webkit-scrollbar {\n width: 6px;\n}\n\nd-contents::-webkit-scrollbar-track {\n background: transparent;\n}\n\nd-contents::-webkit-scrollbar-thumb {\n background: rgba(0, 123, 255, 0.3);\n border-radius: 3px;\n}\n\nd-contents::-webkit-scrollbar-thumb:hover {\n background: rgba(0, 123, 255, 0.5);\n}\n\n/* Custom tooltip styling for tenet links */\nd-contents nav a[title] {\n position: relative;\n cursor: help;\n}\n\nd-contents nav a[title]:hover {\n color: #667eea;\n}\n\n/* Enhanced tooltip using CSS (fallback for title attribute) */\nd-contents nav a[title]:after {\n content: attr(title);\n position: absolute;\n left: 100%;\n top: 50%;\n transform: translateY(-50%);\n background: #1a202c;\n color: white;\n padding: 0.75rem 1rem;\n border-radius: 8px;\n font-size: 0.85em;\n white-space: normal;\n width: 300px;\n line-height: 1.4;\n z-index: 1001;\n opacity: 0;\n visibility: hidden;\n transition: opacity 0.3s ease, visibility 0.3s ease;\n pointer-events: none;\n box-shadow: 0 4px 12px rgba(0, 0, 0, 0.2);\n}\n\nd-contents nav a[title]:before {\n content: '';\n position: absolute;\n left: 100%;\n top: 50%;\n transform: translate(-8px, -50%);\n border: 8px solid transparent;\n border-right-color: #1a202c;\n z-index: 1002;\n opacity: 0;\n visibility: hidden;\n transition: opacity 0.3s ease, visibility 0.3s ease;\n}\n\nd-contents nav a[title]:hover:after,\nd-contents nav a[title]:hover:before {\n opacity: 1;\n visibility: visible;\n}\n\n/* Adjust for smaller screens */\n@media (max-width: 1400px) {\n d-contents nav a[title]:after {\n left: auto;\n right: 100%;\n margin-right: 1rem;\n width: 250px;\n }\n \n d-contents nav a[title]:before {\n left: auto;\n right: 100%;\n transform: translate(8px, -50%);\n border-right-color: transparent;\n border-left-color: #1a202c;\n }\n}\n\n/* Improve code syntax highlighting with Prism */\npre[class*=\"language-\"] {\n background: #f8f9fa !important;\n border: 1px solid #e9ecef !important;\n border-radius: 8px !important;\n padding: 1.5rem !important;\n margin: 1.5rem 0 !important;\n overflow-x: auto !important;\n font-size: 0.9em !important;\n line-height: 1.5 !important;\n}\n\ncode[class*=\"language-\"] {\n background: none !important;\n font-family: 'Monaco', 'Menlo', 'Ubuntu Mono', 'Courier New', monospace !important;\n color: #383a42 !important;\n}\n\n/* Inline code */\np code, li code {\n background: #f1f3f4 !important;\n padding: 0.2em 0.4em !important;\n border-radius: 3px !important;\n font-size: 0.9em !important;\n color: #d73a49 !important;\n}\n\n/* Distill article improvements */\nd-article {\n max-width: none;\n font-size: 18px; /* Increased from default ~16px */\n line-height: 1.7;\n}\n\nd-article > * {\n max-width: 1100px; /* Increased from 900px for more space */\n margin-left: auto;\n margin-right: auto;\n}\n\n/* Make content even wider on large screens when TOC is present */\n@media (min-width: 1400px) {\n d-article > * {\n max-width: 1300px;\n }\n}\n\n/* Improve paragraph readability */\nd-article p {\n font-size: 18px;\n line-height: 1.8;\n margin-bottom: 1.5rem;\n color: #2d3748;\n}\n\n/* Improve heading sizes */\nd-article h1 {\n font-size: 3rem;\n line-height: 1.2;\n margin: 3rem 0 2rem 0;\n color: #1a202c;\n font-weight: 700;\n}\n\nd-article h2 {\n font-size: 2.5rem;\n line-height: 1.3;\n margin: 2.5rem 0 1.5rem 0;\n color: #1a202c;\n font-weight: 650;\n}\n\nd-article h3 {\n font-size: 2rem;\n line-height: 1.4;\n margin: 2rem 0 1rem 0;\n color: #1a202c;\n font-weight: 600;\n}\n\nd-article h4 {\n font-size: 1.5rem;\n line-height: 1.4;\n margin: 1.5rem 0 1rem 0;\n color: #2d3748;\n font-weight: 600;\n}\n\n/* Improve list readability */\nd-article ul li,\nd-article ol li {\n font-size: 18px;\n line-height: 1.7;\n margin-bottom: 0.5rem;\n}\n\n/* Enhanced tenet reference styling with tooltips */\na[href^=\"#source-of-truth\"],\na[href^=\"#one-model-one-file\"],\na[href^=\"#code-is-product\"],\na[href^=\"#standardize-dont-abstract\"],\na[href^=\"#do-repeat-yourself\"],\na[href^=\"#minimal-user-api\"],\na[href^=\"#backwards-compatibility\"],\na[href^=\"#consistent-public-surface\"],\na[href^=\"#modular-toolbox\"] {\n position: relative;\n color: #667eea;\n font-weight: 600;\n text-decoration: underline;\n text-decoration-color: rgba(102, 126, 234, 0.3);\n transition: all 0.3s ease;\n cursor: help;\n}\n\na[href^=\"#source-of-truth\"]:hover,\na[href^=\"#one-model-one-file\"]:hover,\na[href^=\"#code-is-product\"]:hover,\na[href^=\"#standardize-dont-abstract\"]:hover,\na[href^=\"#do-repeat-yourself\"]:hover,\na[href^=\"#minimal-user-api\"]:hover,\na[href^=\"#backwards-compatibility\"]:hover,\na[href^=\"#consistent-public-surface\"]:hover,\na[href^=\"#modular-toolbox\"]:hover {\n color: #4c51bf;\n text-decoration-color: #4c51bf;\n background: rgba(102, 126, 234, 0.1);\n padding: 2px 4px;\n border-radius: 4px;\n}\n\n/* Tooltip content for each tenet */\na[href^=\"#source-of-truth\"]:after { content: \"We should be a source of truth for all model definitions. Model implementations should be reliable, reproducible, and faithful to the original performances.\"; }\na[href^=\"#one-model-one-file\"]:after { content: \"All inference (and most of training, loss is separate, not a part of model) logic visible, top‑to‑bottom.\"; }\na[href^=\"#code-is-product\"]:after { content: \"Optimize for reading, diffing, and tweaking, our users are power users. Variables can be explicit, full words, even several words, readability is primordial.\"; }\na[href^=\"#standardize-dont-abstract\"]:after { content: \"If it's model behavior, keep it in the file; abstractions only for generic infra.\"; }\na[href^=\"#do-repeat-yourself\"]:after { content: \"Copy when it helps users; keep successors in sync without centralizing behavior.\"; }\na[href^=\"#minimal-user-api\"]:after { content: \"Config, model, preprocessing; from_pretrained, save_pretrained, push_to_hub. We want the least amount of codepaths.\"; }\na[href^=\"#backwards-compatibility\"]:after { content: \"Evolve by additive standardization, never break public APIs.\"; }\na[href^=\"#consistent-public-surface\"]:after { content: \"Same argument names, same outputs, hidden states and attentions exposed.\"; }\na[href^=\"#modular-toolbox\"]:after { content: \"Provide tools and utilities, but don't force users into a rigid framework.\"; }\n\n/* Universal tooltip styling for tenet references */\na[href^=\"#source-of-truth\"]:after,\na[href^=\"#one-model-one-file\"]:after,\na[href^=\"#code-is-product\"]:after,\na[href^=\"#standardize-dont-abstract\"]:after,\na[href^=\"#do-repeat-yourself\"]:after,\na[href^=\"#minimal-user-api\"]:after,\na[href^=\"#backwards-compatibility\"]:after,\na[href^=\"#consistent-public-surface\"]:after,\na[href^=\"#modular-toolbox\"]:after {\n position: absolute;\n bottom: 100%;\n left: 50%;\n transform: translateX(-50%);\n background: #1a202c;\n color: white;\n padding: 0.75rem 1rem;\n border-radius: 8px;\n font-size: 0.85em;\n font-weight: 400;\n white-space: normal;\n width: 320px;\n line-height: 1.4;\n z-index: 1001;\n opacity: 0;\n visibility: hidden;\n transition: opacity 0.3s ease, visibility 0.3s ease;\n pointer-events: none;\n box-shadow: 0 4px 12px rgba(0, 0, 0, 0.2);\n margin-bottom: 8px;\n}\n\n/* Tooltip arrows */\na[href^=\"#source-of-truth\"]:before,\na[href^=\"#one-model-one-file\"]:before,\na[href^=\"#code-is-product\"]:before,\na[href^=\"#standardize-dont-abstract\"]:before,\na[href^=\"#do-repeat-yourself\"]:before,\na[href^=\"#minimal-user-api\"]:before,\na[href^=\"#backwards-compatibility\"]:before,\na[href^=\"#consistent-public-surface\"]:before,\na[href^=\"#modular-toolbox\"]:before {\n content: '';\n position: absolute;\n bottom: 100%;\n left: 50%;\n transform: translateX(-50%);\n border: 8px solid transparent;\n border-top-color: #1a202c;\n z-index: 1002;\n opacity: 0;\n visibility: hidden;\n transition: opacity 0.3s ease, visibility 0.3s ease;\n}\n\n/* Show tooltips on hover */\na[href^=\"#source-of-truth\"]:hover:after,\na[href^=\"#one-model-one-file\"]:hover:after,\na[href^=\"#code-is-product\"]:hover:after,\na[href^=\"#standardize-dont-abstract\"]:hover:after,\na[href^=\"#do-repeat-yourself\"]:hover:after,\na[href^=\"#minimal-user-api\"]:hover:after,\na[href^=\"#backwards-compatibility\"]:hover:after,\na[href^=\"#consistent-public-surface\"]:hover:after,\na[href^=\"#modular-toolbox\"]:hover:after,\na[href^=\"#source-of-truth\"]:hover:before,\na[href^=\"#one-model-one-file\"]:hover:before,\na[href^=\"#code-is-product\"]:hover:before,\na[href^=\"#standardize-dont-abstract\"]:hover:before,\na[href^=\"#do-repeat-yourself\"]:hover:before,\na[href^=\"#minimal-user-api\"]:hover:before,\na[href^=\"#backwards-compatibility\"]:hover:before,\na[href^=\"#consistent-public-surface\"]:hover:before,\na[href^=\"#modular-toolbox\"]:hover:before {\n opacity: 1;\n visibility: visible;\n}\n\n/* Improve blockquote styling */\nd-article blockquote {\n font-size: 19px;\n line-height: 1.8;\n padding: 1.5rem 2rem;\n margin: 2rem 0;\n border-left: 4px solid #667eea;\n background: linear-gradient(135deg, #f8f9fa 0%, #e9ecef 50%);\n border-radius: 0 8px 8px 0;\n font-style: italic;\n color: #4a5568;\n}\n\n/* Full width elements */\nd-article .code-compare,\nd-article .interactive-demo,\nd-article .memory-chart-container {\n max-width: none;\n width: 100%;\n margin-left: 0;\n margin-right: 0;\n}\n\n/* Responsive design improvements */\n@media (max-width: 1200px) {\n d-article .code-compare,\n d-article .interactive-demo {\n max-width: 95%;\n margin-left: auto;\n margin-right: auto;\n }\n}\n\n@media (max-width: 768px) {\n .tenet-list li.tenet {\n padding: 1rem;\n }\n \n .interactive-demo .demo-content {\n padding: 1rem;\n }\n}"],"sourceRoot":""}]);
|
| 1858 |
// Exports
|
| 1859 |
/* harmony default export */ const __WEBPACK_DEFAULT_EXPORT__ = (___CSS_LOADER_EXPORT___);
|
| 1860 |
|
dist/main.bundle.js.map
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
dist/static/d3_dependency_graph.html
CHANGED
|
@@ -1827,14 +1827,21 @@ const node = g.append('g')
|
|
| 1827 |
.on('end', dragended)
|
| 1828 |
);
|
| 1829 |
|
| 1830 |
-
// Base‑model icon (
|
| 1831 |
node.filter(d => d.is_base)
|
| 1832 |
-
.append('
|
| 1833 |
-
.attr('
|
| 1834 |
-
.attr('
|
| 1835 |
-
.attr('
|
| 1836 |
-
.attr('width',
|
| 1837 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1838 |
|
| 1839 |
// Base‑model label (below icon)
|
| 1840 |
node.filter(d => d.is_base)
|
|
|
|
| 1827 |
.on('end', dragended)
|
| 1828 |
);
|
| 1829 |
|
| 1830 |
+
// Base‑model icon (styled circle instead of external image)
|
| 1831 |
node.filter(d => d.is_base)
|
| 1832 |
+
.append('circle')
|
| 1833 |
+
.attr('r', parseFloat(getComputedStyle(document.documentElement).getPropertyValue('--base‑size')) / 2)
|
| 1834 |
+
.attr('fill', '#FFD21E')
|
| 1835 |
+
.attr('stroke', '#FF9D00')
|
| 1836 |
+
.attr('stroke-width', 3);
|
| 1837 |
+
|
| 1838 |
+
// Add 🤗 emoji as text for base models
|
| 1839 |
+
node.filter(d => d.is_base)
|
| 1840 |
+
.append('text')
|
| 1841 |
+
.attr('text-anchor', 'middle')
|
| 1842 |
+
.attr('dy', '0.35em')
|
| 1843 |
+
.style('font-size', '24px')
|
| 1844 |
+
.text('🤗');
|
| 1845 |
|
| 1846 |
// Base‑model label (below icon)
|
| 1847 |
node.filter(d => d.is_base)
|
src/fragments/memory-profiler.html
CHANGED
|
@@ -1,11 +1,16 @@
|
|
| 1 |
-
<div
|
| 2 |
-
<div
|
| 3 |
-
<
|
|
|
|
|
|
|
|
|
|
| 4 |
</div>
|
| 5 |
-
|
|
|
|
| 6 |
<iframe src=https://molbap-cuda-warmup-transformers.hf.space width=100% height=800px frameborder=0 style="border-radius: 8px; background: white;"></iframe>
|
| 7 |
</div>
|
| 8 |
-
|
|
|
|
| 9 |
Real CUDA warmup benchmarking with actual Transformers models. Measure the performance impact of the <code>caching_allocator_warmup</code> function at <code>transformers/src/transformers/modeling_utils.py:6186</code>. This interactive tool loads models twice - once with warmup disabled and once with warmup enabled - to demonstrate the significant loading time improvements.
|
| 10 |
</div>
|
| 11 |
</div>
|
|
|
|
| 1 |
+
<div style="border: 1px solid #e2e8f0; border-radius: 8px; background: white; margin: 1.5rem 0;">
|
| 2 |
+
<div style="padding: 1rem; border-bottom: 1px solid #e2e8f0; background: #f8f9fa;">
|
| 3 |
+
<h4 style="margin: 0 0 0.5rem 0; color: #495057;">🚀 CUDA Warmup Efficiency Benchmark</h4>
|
| 4 |
+
<p style="margin: 0; font-size: 0.9em; color: #6c757d;">
|
| 5 |
+
Real CUDA warmup benchmarking with actual Transformers models. Measure the performance impact of the caching_allocator_warmup function.
|
| 6 |
+
</p>
|
| 7 |
</div>
|
| 8 |
+
|
| 9 |
+
<div style="padding: 1rem;">
|
| 10 |
<iframe src=https://molbap-cuda-warmup-transformers.hf.space width=100% height=800px frameborder=0 style="border-radius: 8px; background: white;"></iframe>
|
| 11 |
</div>
|
| 12 |
+
|
| 13 |
+
<div style="padding: 1rem; border-top: 1px solid #e2e8f0; background: #f8f9fa; font-size: 0.9em; color: #6c757d;">
|
| 14 |
Real CUDA warmup benchmarking with actual Transformers models. Measure the performance impact of the <code>caching_allocator_warmup</code> function at <code>transformers/src/transformers/modeling_utils.py:6186</code>. This interactive tool loads models twice - once with warmup disabled and once with warmup enabled - to demonstrate the significant loading time improvements.
|
| 15 |
</div>
|
| 16 |
</div>
|
src/transformers-custom.css
CHANGED
|
@@ -544,6 +544,130 @@ d-article ol li {
|
|
| 544 |
margin-bottom: 0.5rem;
|
| 545 |
}
|
| 546 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 547 |
/* Improve blockquote styling */
|
| 548 |
d-article blockquote {
|
| 549 |
font-size: 19px;
|
|
|
|
| 544 |
margin-bottom: 0.5rem;
|
| 545 |
}
|
| 546 |
|
| 547 |
+
/* Enhanced tenet reference styling with tooltips */
|
| 548 |
+
a[href^="#source-of-truth"],
|
| 549 |
+
a[href^="#one-model-one-file"],
|
| 550 |
+
a[href^="#code-is-product"],
|
| 551 |
+
a[href^="#standardize-dont-abstract"],
|
| 552 |
+
a[href^="#do-repeat-yourself"],
|
| 553 |
+
a[href^="#minimal-user-api"],
|
| 554 |
+
a[href^="#backwards-compatibility"],
|
| 555 |
+
a[href^="#consistent-public-surface"],
|
| 556 |
+
a[href^="#modular-toolbox"] {
|
| 557 |
+
position: relative;
|
| 558 |
+
color: #667eea;
|
| 559 |
+
font-weight: 600;
|
| 560 |
+
text-decoration: underline;
|
| 561 |
+
text-decoration-color: rgba(102, 126, 234, 0.3);
|
| 562 |
+
transition: all 0.3s ease;
|
| 563 |
+
cursor: help;
|
| 564 |
+
}
|
| 565 |
+
|
| 566 |
+
a[href^="#source-of-truth"]:hover,
|
| 567 |
+
a[href^="#one-model-one-file"]:hover,
|
| 568 |
+
a[href^="#code-is-product"]:hover,
|
| 569 |
+
a[href^="#standardize-dont-abstract"]:hover,
|
| 570 |
+
a[href^="#do-repeat-yourself"]:hover,
|
| 571 |
+
a[href^="#minimal-user-api"]:hover,
|
| 572 |
+
a[href^="#backwards-compatibility"]:hover,
|
| 573 |
+
a[href^="#consistent-public-surface"]:hover,
|
| 574 |
+
a[href^="#modular-toolbox"]:hover {
|
| 575 |
+
color: #4c51bf;
|
| 576 |
+
text-decoration-color: #4c51bf;
|
| 577 |
+
background: rgba(102, 126, 234, 0.1);
|
| 578 |
+
padding: 2px 4px;
|
| 579 |
+
border-radius: 4px;
|
| 580 |
+
}
|
| 581 |
+
|
| 582 |
+
/* Tooltip content for each tenet */
|
| 583 |
+
a[href^="#source-of-truth"]:after { content: "We should be a source of truth for all model definitions. Model implementations should be reliable, reproducible, and faithful to the original performances."; }
|
| 584 |
+
a[href^="#one-model-one-file"]:after { content: "All inference (and most of training, loss is separate, not a part of model) logic visible, top‑to‑bottom."; }
|
| 585 |
+
a[href^="#code-is-product"]:after { content: "Optimize for reading, diffing, and tweaking, our users are power users. Variables can be explicit, full words, even several words, readability is primordial."; }
|
| 586 |
+
a[href^="#standardize-dont-abstract"]:after { content: "If it's model behavior, keep it in the file; abstractions only for generic infra."; }
|
| 587 |
+
a[href^="#do-repeat-yourself"]:after { content: "Copy when it helps users; keep successors in sync without centralizing behavior."; }
|
| 588 |
+
a[href^="#minimal-user-api"]:after { content: "Config, model, preprocessing; from_pretrained, save_pretrained, push_to_hub. We want the least amount of codepaths."; }
|
| 589 |
+
a[href^="#backwards-compatibility"]:after { content: "Evolve by additive standardization, never break public APIs."; }
|
| 590 |
+
a[href^="#consistent-public-surface"]:after { content: "Same argument names, same outputs, hidden states and attentions exposed."; }
|
| 591 |
+
a[href^="#modular-toolbox"]:after { content: "Provide tools and utilities, but don't force users into a rigid framework."; }
|
| 592 |
+
|
| 593 |
+
/* Universal tooltip styling for tenet references */
|
| 594 |
+
a[href^="#source-of-truth"]:after,
|
| 595 |
+
a[href^="#one-model-one-file"]:after,
|
| 596 |
+
a[href^="#code-is-product"]:after,
|
| 597 |
+
a[href^="#standardize-dont-abstract"]:after,
|
| 598 |
+
a[href^="#do-repeat-yourself"]:after,
|
| 599 |
+
a[href^="#minimal-user-api"]:after,
|
| 600 |
+
a[href^="#backwards-compatibility"]:after,
|
| 601 |
+
a[href^="#consistent-public-surface"]:after,
|
| 602 |
+
a[href^="#modular-toolbox"]:after {
|
| 603 |
+
position: absolute;
|
| 604 |
+
bottom: 100%;
|
| 605 |
+
left: 50%;
|
| 606 |
+
transform: translateX(-50%);
|
| 607 |
+
background: #1a202c;
|
| 608 |
+
color: white;
|
| 609 |
+
padding: 0.75rem 1rem;
|
| 610 |
+
border-radius: 8px;
|
| 611 |
+
font-size: 0.85em;
|
| 612 |
+
font-weight: 400;
|
| 613 |
+
white-space: normal;
|
| 614 |
+
width: 320px;
|
| 615 |
+
line-height: 1.4;
|
| 616 |
+
z-index: 1001;
|
| 617 |
+
opacity: 0;
|
| 618 |
+
visibility: hidden;
|
| 619 |
+
transition: opacity 0.3s ease, visibility 0.3s ease;
|
| 620 |
+
pointer-events: none;
|
| 621 |
+
box-shadow: 0 4px 12px rgba(0, 0, 0, 0.2);
|
| 622 |
+
margin-bottom: 8px;
|
| 623 |
+
}
|
| 624 |
+
|
| 625 |
+
/* Tooltip arrows */
|
| 626 |
+
a[href^="#source-of-truth"]:before,
|
| 627 |
+
a[href^="#one-model-one-file"]:before,
|
| 628 |
+
a[href^="#code-is-product"]:before,
|
| 629 |
+
a[href^="#standardize-dont-abstract"]:before,
|
| 630 |
+
a[href^="#do-repeat-yourself"]:before,
|
| 631 |
+
a[href^="#minimal-user-api"]:before,
|
| 632 |
+
a[href^="#backwards-compatibility"]:before,
|
| 633 |
+
a[href^="#consistent-public-surface"]:before,
|
| 634 |
+
a[href^="#modular-toolbox"]:before {
|
| 635 |
+
content: '';
|
| 636 |
+
position: absolute;
|
| 637 |
+
bottom: 100%;
|
| 638 |
+
left: 50%;
|
| 639 |
+
transform: translateX(-50%);
|
| 640 |
+
border: 8px solid transparent;
|
| 641 |
+
border-top-color: #1a202c;
|
| 642 |
+
z-index: 1002;
|
| 643 |
+
opacity: 0;
|
| 644 |
+
visibility: hidden;
|
| 645 |
+
transition: opacity 0.3s ease, visibility 0.3s ease;
|
| 646 |
+
}
|
| 647 |
+
|
| 648 |
+
/* Show tooltips on hover */
|
| 649 |
+
a[href^="#source-of-truth"]:hover:after,
|
| 650 |
+
a[href^="#one-model-one-file"]:hover:after,
|
| 651 |
+
a[href^="#code-is-product"]:hover:after,
|
| 652 |
+
a[href^="#standardize-dont-abstract"]:hover:after,
|
| 653 |
+
a[href^="#do-repeat-yourself"]:hover:after,
|
| 654 |
+
a[href^="#minimal-user-api"]:hover:after,
|
| 655 |
+
a[href^="#backwards-compatibility"]:hover:after,
|
| 656 |
+
a[href^="#consistent-public-surface"]:hover:after,
|
| 657 |
+
a[href^="#modular-toolbox"]:hover:after,
|
| 658 |
+
a[href^="#source-of-truth"]:hover:before,
|
| 659 |
+
a[href^="#one-model-one-file"]:hover:before,
|
| 660 |
+
a[href^="#code-is-product"]:hover:before,
|
| 661 |
+
a[href^="#standardize-dont-abstract"]:hover:before,
|
| 662 |
+
a[href^="#do-repeat-yourself"]:hover:before,
|
| 663 |
+
a[href^="#minimal-user-api"]:hover:before,
|
| 664 |
+
a[href^="#backwards-compatibility"]:hover:before,
|
| 665 |
+
a[href^="#consistent-public-surface"]:hover:before,
|
| 666 |
+
a[href^="#modular-toolbox"]:hover:before {
|
| 667 |
+
opacity: 1;
|
| 668 |
+
visibility: visible;
|
| 669 |
+
}
|
| 670 |
+
|
| 671 |
/* Improve blockquote styling */
|
| 672 |
d-article blockquote {
|
| 673 |
font-size: 19px;
|
webpack.config.js
CHANGED
|
@@ -139,7 +139,7 @@ module.exports = {
|
|
| 139 |
|
| 140 |
// Extract tenet text for tooltips
|
| 141 |
const tenetTooltips = {
|
| 142 |
-
'source-of-truth': 'We
|
| 143 |
'one-model-one-file': 'All inference (and most of training, loss is separate, not a part of model) logic visible, top‑to‑bottom.',
|
| 144 |
'code-is-product': 'Optimize for reading, diffing, and tweaking, our users are power users. Variables can be explicit, full words, even several words, readability is primordial.',
|
| 145 |
'standardize-dont-abstract': 'If it\\'s model behavior, keep it in the file; abstractions only for generic infra.',
|
|
@@ -225,22 +225,22 @@ module.exports = {
|
|
| 225 |
<script src="https://d3js.org/d3.v7.min.js"></script>
|
| 226 |
<meta name="viewport" content="width=device-width, initial-scale=1">
|
| 227 |
<meta charset="utf8">
|
| 228 |
-
<title>
|
| 229 |
<link rel="stylesheet" href="style.css">
|
| 230 |
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css">
|
| 231 |
</head>
|
| 232 |
<body>
|
| 233 |
<d-front-matter>
|
| 234 |
<script id='distill-front-matter' type="text/json">{
|
| 235 |
-
"title": "
|
| 236 |
-
"description": "
|
| 237 |
"published": "Aug 21, 2025",
|
| 238 |
"authors": [{"author": "Pablo Montalvo", "authorURL": "https://huggingface.co/Molbap"}]
|
| 239 |
}</script>
|
| 240 |
</d-front-matter>
|
| 241 |
<d-title>
|
| 242 |
-
<h1>
|
| 243 |
-
<p>
|
| 244 |
</d-title>
|
| 245 |
<d-byline></d-byline>
|
| 246 |
<d-article>
|
|
|
|
| 139 |
|
| 140 |
// Extract tenet text for tooltips
|
| 141 |
const tenetTooltips = {
|
| 142 |
+
'source-of-truth': 'We aim to be a source of truth for all model definitions. Model implementations should be reliable, reproducible, and faithful to the original performances.',
|
| 143 |
'one-model-one-file': 'All inference (and most of training, loss is separate, not a part of model) logic visible, top‑to‑bottom.',
|
| 144 |
'code-is-product': 'Optimize for reading, diffing, and tweaking, our users are power users. Variables can be explicit, full words, even several words, readability is primordial.',
|
| 145 |
'standardize-dont-abstract': 'If it\\'s model behavior, keep it in the file; abstractions only for generic infra.',
|
|
|
|
| 225 |
<script src="https://d3js.org/d3.v7.min.js"></script>
|
| 226 |
<meta name="viewport" content="width=device-width, initial-scale=1">
|
| 227 |
<meta charset="utf8">
|
| 228 |
+
<title>Scaling insanity: maintaining hundreds of model definitions</title>
|
| 229 |
<link rel="stylesheet" href="style.css">
|
| 230 |
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css">
|
| 231 |
</head>
|
| 232 |
<body>
|
| 233 |
<d-front-matter>
|
| 234 |
<script id='distill-front-matter' type="text/json">{
|
| 235 |
+
"title": "Scaling insanity: maintaining hundreds of model definitions",
|
| 236 |
+
"description": "A peek into software engineering for the transformers library",
|
| 237 |
"published": "Aug 21, 2025",
|
| 238 |
"authors": [{"author": "Pablo Montalvo", "authorURL": "https://huggingface.co/Molbap"}]
|
| 239 |
}</script>
|
| 240 |
</d-front-matter>
|
| 241 |
<d-title>
|
| 242 |
+
<h1>Scaling insanity: maintaining hundreds of model definitions</h1>
|
| 243 |
+
<p>A peek into software engineering for the transformers library</p>
|
| 244 |
</d-title>
|
| 245 |
<d-byline></d-byline>
|
| 246 |
<d-article>
|