Papers
arxiv:2601.09001

Entropy Sentinel: Continuous LLM Accuracy Monitoring from Decoding Entropy Traces in STEM

Published on Jan 13
· Submitted by
Luciano Del Corro
on Jan 19

Abstract

Output-entropy profiles computed from final-layer next-token probabilities serve as a scalable signal for monitoring LLM performance and prioritizing data acquisition under domain shifts.

AI-generated summary

Deploying LLMs raises two coupled challenges: (1) monitoring - estimating where a model underperforms as traffic and domains drift - and (2) improvement - prioritizing data acquisition to close the largest performance gaps. We test whether an inference-time signal can estimate slice-level accuracy under domain shift. For each response, we compute an output-entropy profile from final-layer next-token probabilities (from top-k logprobs) and summarize it with eleven statistics. A lightweight classifier predicts instance correctness, and averaging predicted probabilities yields a domain-level accuracy estimate. We evaluate on ten STEM reasoning benchmarks with exhaustive train/test compositions (k in {1,2,3,4}; all "10 choose k" combinations), across nine LLMs from six families (3B-20B). Estimates often track held-out benchmark accuracy, and several models show near-monotonic ordering of domains. Output-entropy profiles are thus an accessible signal for scalable monitoring and for targeting data acquisition.

Community

Paper author Paper submitter

A first exploration of a lightweight, inference-time method for monitoring LLM accuracy under domain drift using output-entropy traces derived from next-token probabilities. This approach demonstrates promising results for slice-level accuracy estimation across STEM reasoning benchmarks, suggesting that entropy-based signals could serve as a practical tool for real-time model monitoring in production. It offers potential utility for both continuous performance tracking and prioritizing data acquisition in dynamic environments.

arXivlens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/entropy-sentinel-continuous-llm-accuracy-monitoring-from-decoding-entropy-traces-in-stem-9202-3e444ec7

  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.09001 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.09001 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.09001 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.