Ring-mini-2.0: Small Model, Great Intelligence

Community Article Published October 3, 2025

Foreword

Since 9/10/2025, Ant Group officially began open-sourcing Ling 2.0 — a series of MoE (Mixture-of-Experts) architecture Large Language Models (LLMs) that combines State-of-the-Art (SOTA) performance with high efficiency. This is the latest open-source LLM series produced by Ant inclusionAI, an AI research initiative backed by Ant Group, the master company of Alipay.

Ring-mini-2.0 is a high-performance MoE (Mixture-of-Experts) inference model deeply optimized based on the Ling 2.0 architecture. With a total parameter count of 16B but only 1.4B active parameters, it achieves the comprehensive inference capabilities of dense models below the 10B scale, particularly excelling in logical reasoning, coding, and mathematical tasks. It also supports a 128K long context window and high-speed generation exceeding 300+ tokens/s.

Some glossary terms:

  • SFT — Supervised Fine-Tuning
  • RLVR — Reinforcement Learning with Verifiable Rewards
  • RLHF — Reinforcement Learning from Human Feedback
  • MTP — Multi-Token Prediction

Enhanced Reasoning: Joint Training with SFT + RLVR + RLHF

Ring-mini-2.0 is based on Ling-mini-2.0-base and is further trained through a combined optimization process: Long-COT SFT, more stable and continuous RLVR, and RLHF. This significantly enhances the stability and generalization of its complex reasoning abilities.

Across multiple challenging benchmarks (such as LiveCodeBench, AIME 2025, GPQA, and ARC-AGI-v1), Ring-mini-2.0’s performance significantly surpasses dense models below 10B, and even rivals larger MoE models (e.g., gpt-oss-20B-medium) given equivalent output length. It is particularly outstanding in logical reasoning.

ring-mini-1

High Sparsity, High-Speed Generation

Inheriting the efficient MoE design of the Ling 2.0 series, Ring-mini-2.0 only activates 1.4B parameters. Through architectural optimizations such as a 1/32 expert activation ratio and the MTP layer, it achieves performance equivalent to approximately 7–8B dense models.

Thanks to its low-activation, high-sparsity design, Ring-mini-2.0 achieves a throughput of 300+ tokens/s when deployed on an H20 server. This can be further boosted to 500+ tokens/s with the addition of Expert Dual Streaming inference optimization, drastically reducing the inference cost of “Thinking” models in high-concurrency scenarios. Moreover, leveraging YaRN extrapolation, it supports a 128K long context window. The relative acceleration ratio in long-output scenarios can reach up to 7 times or more.

ring-mini-2 (3)

ring-mini-3 (3)

Fully Open Source: Model Weights, Training Strategy, and Data Recipe

We are fully releasing the Ring-mini-2.0 model weights, training data, and the RLVR+RLHF training strategy.

With its “small and excellent” characteristics, Ring-mini-2.0 is poised to become the preferred choice for small-scale inference models, providing an ideal starting point for research and application in both the academic and industrial communities.

We welcome everyone to visit our open-source repository to download and use it!

So, where to find the models?

Thanks so much to akhaliq(https://x.com/_akhaliq)to build a AI assistant demo using ring-mini-2.0 and anycoder! https://huggingface.co/spaces/akhaliq/Ring-mini-2.0

https://huggingface.co/inclusionAI/Ring-mini-2.0 to download https://modelscope.cn/models/inclusionAI/Ring-mini-2.0 to download (CN)

#Ring-V2 #mini-LLM

Community

Sign up or log in to comment