MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published Jan 14 • 298
Lizard: An Efficient Linearization Framework for Large Language Models Paper • 2507.09025 • Published Jul 11 • 18
On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective Paper • 2507.23632 • Published Jul 31 • 6
Hybrid Architectures for Language Models: Systematic Analysis and Design Insights Paper • 2510.04800 • Published 22 days ago • 36
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published 22 days ago • 453
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention Paper • 2510.04212 • Published 23 days ago • 22
Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models Paper • 2510.03561 • Published 25 days ago • 23
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning Paper • 2510.19338 • Published 6 days ago • 97