Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models Paper • 2502.15499 • Published Feb 21, 2025 • 15
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling Paper • 2501.16975 • Published Jan 28, 2025 • 31
Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs Paper • 2406.08657 • Published Jun 12, 2024 • 10
GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond Paper • 2309.16583 • Published Sep 28, 2023 • 12