Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence Paper • 2511.07384 • Published Nov 10 • 16 • 2
Gemstones: A Model Suite for Multi-Faceted Scaling Laws Paper • 2502.06857 • Published Feb 7 • 24 • 2