amd
/

Safetensors
llama
alignment-handbook
Generated from Trainer
Mingyuyang-1 commited on
Commit
c231932
·
verified ·
1 Parent(s): 5121d53

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -13
README.md CHANGED
@@ -8,25 +8,14 @@ model-index:
8
  tags:
9
  - alignment-handbook
10
  - generated_from_trainer
11
-
12
  ---
13
 
14
  # Zebra-Llama: Towards Extremely Efficient Hybrid Models
15
  Zebra-Llama is a family of hybrid large language models (LLMs) proposed by AMD that achieve Transformer-level accuracy with near-State Space Model (SSM) efficiency. While standard Transformers are limited by the quadratic complexity of self-attention and the large memory footprint of their key-value (KV) cache, Zebra-Llama offers a practical and scalable solution.
16
 
17
-
18
  This model, `Zebra-Llama-1B-4MLA-12M2`, is created by efficiently adapting the pre-trained Llama-3.2-1B-Instruct model. It composes efficient hybrid layers, combining Multi-head Latent Attention (MLA) for KV cache compression and Mamba2 (an SSM) for computational efficiency. This approach bypasses the need for costly pre-training from scratch.
19
 
20
-
21
- Zebra-Llama is a family of hybrid large language models (LLMs) proposed by AMD that achieve Transformer-level accuracy with near-State Space Model (SSM) efficiency. While standard Transformers are limited by the quadratic complexity of self-attention and the large memory footprint of their key-value (KV) cache, Zebra-Llama offers a practical and scalable solution.
22
-
23
-
24
- This model, Zebra-Llama-1B-4MLA-12M2, is created by efficiently adapting the pre-trained Llama-3.2-1B-Instruct model. It composes efficient hybrid layers, combining Multi-head Latent Attention (MLA) for KV cache compression and Mamba2 (an SSM) for computational efficiency. This approach bypasses the need for costly pre-training from scratch.
25
-
26
-
27
-
28
-
29
-
30
  The composition follows a three-stage pipeline to effectively transfer knowledge from the pre-trained Transformer.
31
 
32
 
@@ -136,4 +125,4 @@ If you find this model useful, please consider citing the original paper:
136
  journal={arXiv preprint arXiv:2503.11132},
137
  year={2025}
138
  }
139
- ```
 
8
  tags:
9
  - alignment-handbook
10
  - generated_from_trainer
11
+ license: apache-2.0
12
  ---
13
 
14
  # Zebra-Llama: Towards Extremely Efficient Hybrid Models
15
  Zebra-Llama is a family of hybrid large language models (LLMs) proposed by AMD that achieve Transformer-level accuracy with near-State Space Model (SSM) efficiency. While standard Transformers are limited by the quadratic complexity of self-attention and the large memory footprint of their key-value (KV) cache, Zebra-Llama offers a practical and scalable solution.
16
 
 
17
  This model, `Zebra-Llama-1B-4MLA-12M2`, is created by efficiently adapting the pre-trained Llama-3.2-1B-Instruct model. It composes efficient hybrid layers, combining Multi-head Latent Attention (MLA) for KV cache compression and Mamba2 (an SSM) for computational efficiency. This approach bypasses the need for costly pre-training from scratch.
18
 
 
 
 
 
 
 
 
 
 
 
19
  The composition follows a three-stage pipeline to effectively transfer knowledge from the pre-trained Transformer.
20
 
21
 
 
125
  journal={arXiv preprint arXiv:2503.11132},
126
  year={2025}
127
  }
128
+ ```