ML Theory The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain Paper • 2509.26507 • Published Sep 30, 2025 • 537 Muon Outperforms Adam in Tail-End Associative Memory Learning Paper • 2509.26030 • Published Sep 30, 2025 • 19 Why Language Models Hallucinate Paper • 2509.04664 • Published Sep 4, 2025 • 194
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain Paper • 2509.26507 • Published Sep 30, 2025 • 537
Muon Outperforms Adam in Tail-End Associative Memory Learning Paper • 2509.26030 • Published Sep 30, 2025 • 19
Tokens Is There a Case for Conversation Optimized Tokenizers in Large Language Models? Paper • 2506.18674 • Published Jun 23, 2025 • 8
Is There a Case for Conversation Optimized Tokenizers in Large Language Models? Paper • 2506.18674 • Published Jun 23, 2025 • 8
ML Theory The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain Paper • 2509.26507 • Published Sep 30, 2025 • 537 Muon Outperforms Adam in Tail-End Associative Memory Learning Paper • 2509.26030 • Published Sep 30, 2025 • 19 Why Language Models Hallucinate Paper • 2509.04664 • Published Sep 4, 2025 • 194
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain Paper • 2509.26507 • Published Sep 30, 2025 • 537
Muon Outperforms Adam in Tail-End Associative Memory Learning Paper • 2509.26030 • Published Sep 30, 2025 • 19
Tokens Is There a Case for Conversation Optimized Tokenizers in Large Language Models? Paper • 2506.18674 • Published Jun 23, 2025 • 8
Is There a Case for Conversation Optimized Tokenizers in Large Language Models? Paper • 2506.18674 • Published Jun 23, 2025 • 8