Huxley-Gödel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine Paper • 2510.21614 • Published 4 days ago • 15
view article Article huggingface_hub v1.0: Five Years of Building the Foundation of Open Machine Learning 1 day ago • 28
The Art of Asking: Multilingual Prompt Optimization for Synthetic Data Paper • 2510.19806 • Published 6 days ago • 1
Mask and You Shall Receive: Optimizing Masked Language Modeling For Pretraining BabyLMs Paper • 2510.20475 • Published 5 days ago • 1
view article Article Introducing MTEB v2: Evaluation of embedding and retrieval systems for more than just text By isaacchung and 2 others • 8 days ago • 33
Empirical Evaluation of Knowledge Distillation from Transformers to Subquadratic Language Models Paper • 2504.14366 • Published Apr 19 • 1
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale Paper • 2510.14979 • Published 12 days ago • 65
The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models Paper • 2510.13996 • Published 13 days ago • 6
Sparse Subnetwork Enhancement for Underrepresented Languages in Large Language Models Paper • 2510.13580 • Published 13 days ago • 1
Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples Paper • 2510.07192 • Published 20 days ago • 4
view article Article Model statistics of the 50 most downloaded entities on Hugging Face By lbourdois • 15 days ago • 26
Standard-to-Dialect Transfer Trends Differ across Text and Speech: A Case Study on Intent and Topic Classification in German Dialects Paper • 2510.07890 • Published 19 days ago • 1
MultiBLiMP 1.0: A Massively Multilingual Benchmark of Linguistic Minimal Pairs Paper • 2504.02768 • Published Apr 3 • 2
view article Article There is no such thing as a tokenizer-free lunch By catherinearnett • Sep 25 • 84