Commit
路
b896046
1
Parent(s):
d0301e3
Update README.md
Browse filesupdate README to inform model users what the model is about, how to use it, etc
README.md
CHANGED
|
@@ -1,3 +1,13 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
+
|
| 5 |
+
QuaLA-MiniLM: a Quantized Length Adaptive
|
| 6 |
+
MiniLM
|
| 7 |
+
|
| 8 |
+
|
| 9 |
+
The article discusses the challenge of making transformer-based models efficient enough for practical use, given their size and computational requirements. The authors propose a new approach called QuaLA-MiniLM, which combines knowledge distillation, the length-adaptive transformer (LAT) technique, and low-bit quantization. This approach trains a single model that can adapt to any inference scenario with a given computational budget, achieving a superior accuracy-efficiency trade-off on the SQuAD1.1 dataset. The authors compare their approach to other efficient methods and find that it achieves up to an x8.8 speedup with less than 1% accuracy loss. They also provide their code publicly on GitHub. The article also discusses other related work in the field, including dynamic transformers and other knowledge distillation approaches.
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
|