arxiv:2510.21003

Distilled Decoding 2: One-step Sampling of Image Auto-regressive Models with Conditional Score Distillation

Published on Oct 23

· Submitted by

Authors:

Abstract

A new method, Distilled Decoding 2 (DD2), enables one-step sampling for image auto-regressive models with minimal performance degradation and significant speed-up compared to previous methods.

AI-generated summary

Image Auto-regressive (AR) models have emerged as a powerful paradigm of visual generative models. Despite their promising performance, they suffer from slow generation speed due to the large number of sampling steps required. Although Distilled Decoding 1 (DD1) was recently proposed to enable few-step sampling for image AR models, it still incurs significant performance degradation in the one-step setting, and relies on a pre-defined mapping that limits its flexibility. In this work, we propose a new method, Distilled Decoding 2 (DD2), to further advances the feasibility of one-step sampling for image AR models. Unlike DD1, DD2 does not without rely on a pre-defined mapping. We view the original AR model as a teacher model which provides the ground truth conditional score in the latent embedding space at each token position. Based on this, we propose a novel conditional score distillation loss to train a one-step generator. Specifically, we train a separate network to predict the conditional score of the generated distribution and apply score distillation at every token position conditioned on previous tokens. Experimental results show that DD2 enables one-step sampling for image AR models with an minimal FID increase from 3.40 to 5.43 on ImageNet-256. Compared to the strongest baseline DD1, DD2 reduces the gap between the one-step sampling and original AR model by 67%, with up to 12.3times training speed-up simultaneously. DD2 takes a significant step toward the goal of one-step AR generation, opening up new possibilities for fast and high-quality AR modeling. Code is available at https://github.com/imagination-research/Distilled-Decoding-2.

View arXiv page View PDF Add to collection

Community

fjxmlzn

Paper submitter about 12 hours ago

Published in NeurIPS 2025

See also our previous work Distilled Decoding 1 (ICLR 2025):
https://imagination-research.github.io/distilled-decoding/

fjxmlzn

Paper submitter about 7 hours ago

VAR and LlamaGen can be made to generate images in just one step!

Published in NeurIPS 2025.

See also our previous work Distilled Decoding 1 (ICLR 2025):
https://imagination-research.github.io/distilled-decoding/

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.21003 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.21003 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.21003 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.