DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI Paper • 2512.16676 • Published 13 days ago • 194
R2E-Gym: Procedural Environments and Hybrid Verifiers for Scaling Open-Weights SWE Agents Paper • 2504.07164 • Published Apr 9 • 1
Running on CPU Upgrade Featured 2.75k The Smol Training Playbook 📚 2.75k The secrets to building world-class LLMs
Running 3.61k The Ultra-Scale Playbook 🌌 3.61k The ultimate guide to training LLM on large GPU Clusters
Running Featured 1.24k FineWeb: decanting the web for the finest text data at scale 🍷 1.24k Generate high-quality text data for LLMs using FineWeb
view article Article Tokenization in Transformers v5: Simpler, Clearer, and More Modular +4 13 days ago • 87