arxiv:2601.00747

The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving

Published on Jan 2

· Submitted by

Max Ruiz Luyten on Jan 5

University of Cambridge

Upvote

Authors:

Abstract

Large language model training methods that optimize for correctness can cause reasoning path diversity collapse, but a new variational framework provides principled solutions to maintain both accuracy and creativity.

AI-generated summary

State-of-the-art large language model (LLM) pipelines rely on bootstrapped reasoning loops: sampling diverse chains of thought and reinforcing the highest-scoring ones, mainly optimizing correctness. We analyze how this design choice is sensitive to the collapse of the model's distribution over reasoning paths, slashing semantic entropy and undermining creative problem-solving. To analyze this failure, we introduce Distributional Creative Reasoning (DCR), a unified variational objective that casts training as gradient flow through probability measures on solution traces. STaR, GRPO, and DPO, as well as entropy bonuses, and other methods, all constitute special cases of the same loss. The framework delivers three core results: (i) the diversity decay theorem, describing how correctness-based objectives lead to distinct modes of diversity decay for STaR, GRPO, and DPO; (ii) designs that ensure convergence to a stable and diverse policy, effectively preventing collapse; and (iii) simple, actionable recipes to achieve this in practice. DCR thus offers the first principled recipe for LLMs that remain both correct and creative.

View arXiv page View PDF Add to collection

Community

maxruizluyten

Paper submitter 2 days ago

For those of you interested in RLVR, here is a paper that formally characterizes the mechanism behind "diversity collapse" in reasoning models trained with scalar rewards (such as STaR, GRPO, and DPO).

The paper introduces a variational framework based on Shahshahani gradient flow to prove that optimizing solely for correctness inherently erodes the diversity of reasoning paths, leading to a "reasoning monoculture." To address this, they propose Distributional Creative Reasoning (DCR), which incorporates a diversity energy functional (using entropy and kernel-based novelty) into the objective, mathematically guaranteeing the maintenance of a diverse portfolio of successful reasoning strategies while still optimizing for utility.