Update README.md
Browse files
README.md
CHANGED
|
@@ -9,9 +9,9 @@ tags:
|
|
| 9 |
|
| 10 |
# Mistral-7B-Instruct-v0.2-expanded
|
| 11 |
|
| 12 |
-
This method employs mergekit's passthrough method to expand blocks within the "mistralai/Mistral-7B-Instruct-v0.2" model. For every
|
| 13 |
a new layer is added, with the `o_proj` and `down_proj` parameters of these added layers initialized to zero, mirroring the approach used in LLaMA Pro.
|
| 14 |
-
It's important to note that this configuration has not undergone fine-tuning. Therefore, when fine-tuning, ensure that only every
|
| 15 |
while all other layers remain frozen.
|
| 16 |
|
| 17 |
## 🧩 Configuration
|
|
|
|
| 9 |
|
| 10 |
# Mistral-7B-Instruct-v0.2-expanded
|
| 11 |
|
| 12 |
+
This method employs mergekit's passthrough method to expand blocks within the "mistralai/Mistral-7B-Instruct-v0.2" model. For every 5th layer,
|
| 13 |
a new layer is added, with the `o_proj` and `down_proj` parameters of these added layers initialized to zero, mirroring the approach used in LLaMA Pro.
|
| 14 |
+
It's important to note that this configuration has not undergone fine-tuning. Therefore, when fine-tuning, ensure that only every 5th layer is trainable,
|
| 15 |
while all other layers remain frozen.
|
| 16 |
|
| 17 |
## 🧩 Configuration
|