-
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 6 -
Weighted Grouped Query Attention in Transformers
Paper • 2407.10855 • Published -
Fast Transformer Decoding: One Write-Head is All You Need
Paper • 1911.02150 • Published • 9 -
MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding
Paper • 2406.09297 • Published • 6
Ahmed Ali
ahmed-ali
AI & ML interests
None yet
Recent Activity
updated
a collection
about 2 hours ago
Inference-Optimization
updated
a collection
about 2 hours ago
Inference-Optimization
updated
a collection
about 2 hours ago
Inference-Optimization
Organizations
None yet