Delta Attention: Fast and Accurate Sparse Attention Inference by Delta Correction Paper • 2505.11254 • Published May 16, 2025 • 48
UMoE: Unifying Attention and FFN with Shared Experts Paper • 2505.07260 • Published May 12, 2025 • 9
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27, 2024 • 627