amd
/

Zebra-Llama-1B-4MLA-12Mamba-DPO

alignment-handbook

Generated from Trainer

Model card Files Files and versions

Mingyuyang-1 commited on Jun 13

Commit

c231932

·

verified ·

1 Parent(s): 5121d53

Update README.md

Files changed (1) hide show

README.md +2 -13

README.md CHANGED Viewed

@@ -8,25 +8,14 @@ model-index:
 tags:
 - alignment-handbook
 - generated_from_trainer
 ---
 # Zebra-Llama: Towards Extremely Efficient Hybrid Models
 Zebra-Llama is a family of hybrid large language models (LLMs) proposed by AMD that achieve Transformer-level accuracy with near-State Space Model (SSM) efficiency. While standard Transformers are limited by the quadratic complexity of self-attention and the large memory footprint of their key-value (KV) cache, Zebra-Llama offers a practical and scalable solution.
 This model, `Zebra-Llama-1B-4MLA-12M2`, is created by efficiently adapting the pre-trained Llama-3.2-1B-Instruct model. It composes efficient hybrid layers, combining Multi-head Latent Attention (MLA) for KV cache compression and Mamba2 (an SSM) for computational efficiency. This approach bypasses the need for costly pre-training from scratch.
-Zebra-Llama is a family of hybrid large language models (LLMs) proposed by AMD that achieve Transformer-level accuracy with near-State Space Model (SSM) efficiency. While standard Transformers are limited by the quadratic complexity of self-attention and the large memory footprint of their key-value (KV) cache, Zebra-Llama offers a practical and scalable solution.
-This model, Zebra-Llama-1B-4MLA-12M2, is created by efficiently adapting the pre-trained Llama-3.2-1B-Instruct model. It composes efficient hybrid layers, combining Multi-head Latent Attention (MLA) for KV cache compression and Mamba2 (an SSM) for computational efficiency. This approach bypasses the need for costly pre-training from scratch.
 The composition follows a three-stage pipeline to effectively transfer knowledge from the pre-trained Transformer.
@@ -136,4 +125,4 @@ If you find this model useful, please consider citing the original paper:
   journal={arXiv preprint arXiv:2503.11132},
   year={2025}
 }
-```

 tags:
 - alignment-handbook
 - generated_from_trainer
+license: apache-2.0
 ---
 # Zebra-Llama: Towards Extremely Efficient Hybrid Models
 Zebra-Llama is a family of hybrid large language models (LLMs) proposed by AMD that achieve Transformer-level accuracy with near-State Space Model (SSM) efficiency. While standard Transformers are limited by the quadratic complexity of self-attention and the large memory footprint of their key-value (KV) cache, Zebra-Llama offers a practical and scalable solution.
 This model, `Zebra-Llama-1B-4MLA-12M2`, is created by efficiently adapting the pre-trained Llama-3.2-1B-Instruct model. It composes efficient hybrid layers, combining Multi-head Latent Attention (MLA) for KV cache compression and Mamba2 (an SSM) for computational efficiency. This approach bypasses the need for costly pre-training from scratch.
 The composition follows a three-stage pipeline to effectively transfer knowledge from the pre-trained Transformer.
   journal={arXiv preprint arXiv:2503.11132},
   year={2025}
 }
+```