Spaces:

transformers-community
/

Transformers-tenets

Running

App Files Files Community

Molbap HF Staff commited on Aug 20

Commit

997e875

1 Parent(s): 34046df

attempt

Browse files

Files changed (2) hide show

app.py +1 -1
content/article.md +7 -7

app.py CHANGED Viewed

@@ -281,7 +281,7 @@ def build_image(filename):
         for directory in ['content', 'static']:
             filepath = Path(directory) / filename
             if filepath.exists():
-                gr.File(value=str(filepath), show_label=False, interactive=False)
                 return
         gr.Markdown(f"*Image not found: {filename}*")
     return _build

         for directory in ['content', 'static']:
             filepath = Path(directory) / filename
             if filepath.exists():
+                gr.Image(value=str(filepath), show_label=False, interactive=False, show_download_button=False)
                 return
         gr.Markdown(f"*Image not found: {filename}*")
     return _build

content/article.md CHANGED Viewed

@@ -262,9 +262,9 @@ To get this graph, I used the heuristic of modular inheritance.
 So what do we see? Llama is a basis for many models, and it shows.
 Radically different architectures such as mamba have spawned their own dependency subgraph.
-[code relatedness](d3_dependency_graph.html)
-![[graph_modular_related_models.png]]
  But there is no similar miracle for VLMs across the board.
 As you can see, there is a small DETR island, a little llava pocket, and so on, but it's not comparable to the centrality observed.
@@ -278,7 +278,7 @@ So I looked into Jaccard similarity, which we use to measure set differences. I
 {{TERMINAL}}
-![[Jaccard_similarity_plot.png]]
 The yellow areas are places where models are very different to each other. We can see islands here and there corresponding to model families. Llava goes with Llava-onevision, LlavaNext, LlavaNext-video, etc.
 ## VLM improvements, avoiding abstraction
@@ -296,7 +296,7 @@ But this is breaking [Standardize, don't abstract](#standardize-dont-abstract).
 This is the current state of abstractions across a modeling file:
-![[Bloatedness_visualizer.png]]
 The following [Pull request to standardize placeholder masking](https://github.com/huggingface/transformers/pull/39777) is a good example of what kind of changes are acceptable. In a VLM, we always need to insert embeddings from various encoders at various positions, so we can have a function to do it. For Qwen2 VL, for instance, it will look like this:
@@ -350,13 +350,13 @@ So the question abounds naturally: How can we modularize more?
 I took again a similarity measure and looked at the existing graphs. The tool is available on this [ZeroGPU-enabled Space](https://huggingface.co/spaces/Molbap/transformers-modular-refactor). It scans the whole transformers repository, and outputs a graph of candidates across models, using either a Jaccard similarity index (simple) or a SentenceTransformers embedding model. It is understandable that [encoder models still have a lion's share of the game.](#encoders-ftw) See also [Tom Aarsen and Arhur Bresnu's great blog post on the topic of sparse embeddings.](https://huggingface.co/blog/train-sparse-encoder).
-![[modular_candidates.png]]
 ## <a id="encoders-ftw"></a> Encoders win !
 Models popularity speaks for itself! This is because the usage of encoders lies in embeddings obviously. So we have to keep the encoders part viable, usable, fine-tune-able.
-![[popular_models_barplot.png]]
 ## On image processing and processors
 Choosing to be a `torch`-first software meant relieving a tremendous amount of support from `jax ` and `TensorFlow` , and it also meant that we could be more lenient into the amount of torch-dependent utilities that we were able to add. One of these is the _fast processing_ of images. Where they were before assumed to be minimal ndarrays, making stronger assumptions and enforcing `torch` and `torchvision`native inputs allowed up to speed up massively the processing time for each model.
@@ -386,7 +386,7 @@ Because it is all PyTorch (and it is even more now that we support only PyTorch)
 It just works with PyTorch models and is especially useful when aligning outputs with a reference implementation, aligned with our core guideline, [source of truth for model definitions](#source-of-truth).
-![[model_debugger.png]]
 ### Transformers-serve

 So what do we see? Llama is a basis for many models, and it shows.
 Radically different architectures such as mamba have spawned their own dependency subgraph.
+{{D3_GRAPH}}
+{{graph_modular_related_models}}
  But there is no similar miracle for VLMs across the board.
 As you can see, there is a small DETR island, a little llava pocket, and so on, but it's not comparable to the centrality observed.
 {{TERMINAL}}
+{{Jaccard_similarity_plot}}
 The yellow areas are places where models are very different to each other. We can see islands here and there corresponding to model families. Llava goes with Llava-onevision, LlavaNext, LlavaNext-video, etc.
 ## VLM improvements, avoiding abstraction
 This is the current state of abstractions across a modeling file:
+{{Bloatedness_visualizer}}
 The following [Pull request to standardize placeholder masking](https://github.com/huggingface/transformers/pull/39777) is a good example of what kind of changes are acceptable. In a VLM, we always need to insert embeddings from various encoders at various positions, so we can have a function to do it. For Qwen2 VL, for instance, it will look like this:
 I took again a similarity measure and looked at the existing graphs. The tool is available on this [ZeroGPU-enabled Space](https://huggingface.co/spaces/Molbap/transformers-modular-refactor). It scans the whole transformers repository, and outputs a graph of candidates across models, using either a Jaccard similarity index (simple) or a SentenceTransformers embedding model. It is understandable that [encoder models still have a lion's share of the game.](#encoders-ftw) See also [Tom Aarsen and Arhur Bresnu's great blog post on the topic of sparse embeddings.](https://huggingface.co/blog/train-sparse-encoder).
+{{modular_candidates}}
 ## <a id="encoders-ftw"></a> Encoders win !
 Models popularity speaks for itself! This is because the usage of encoders lies in embeddings obviously. So we have to keep the encoders part viable, usable, fine-tune-able.
+{{popular_models_barplot}}
 ## On image processing and processors
 Choosing to be a `torch`-first software meant relieving a tremendous amount of support from `jax ` and `TensorFlow` , and it also meant that we could be more lenient into the amount of torch-dependent utilities that we were able to add. One of these is the _fast processing_ of images. Where they were before assumed to be minimal ndarrays, making stronger assumptions and enforcing `torch` and `torchvision`native inputs allowed up to speed up massively the processing time for each model.
 It just works with PyTorch models and is especially useful when aligning outputs with a reference implementation, aligned with our core guideline, [source of truth for model definitions](#source-of-truth).
+{{model_debugger}}
 ### Transformers-serve