Papers
arxiv:2512.22322

SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

Published on Dec 26
Β· Submitted by
Yulei Qin
on Dec 30
#1 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

Agentic reinforcement learning (RL) holds great promise for the development of autonomous agents under complex GUI tasks, but its scalability remains severely hampered by the verification of task completion. Existing task verification is treated as a passive, post-hoc process: a verifier (i.e., rule-based scoring script, reward or critic model, and LLM-as-a-Judge) analyzes the agent's entire interaction trajectory to determine if the agent succeeds. Such processing of verbose context that contains irrelevant, noisy history poses challenges to the verification protocols and therefore leads to prohibitive cost and low reliability. To overcome this bottleneck, we propose SmartSnap, a paradigm shift from this passive, post-hoc verification to proactive, in-situ self-verification by the agent itself. We introduce the Self-Verifying Agent, a new type of agent designed with dual missions: to not only complete a task but also to prove its accomplishment with curated snapshot evidences. Guided by our proposed 3C Principles (Completeness, Conciseness, and Creativity), the agent leverages its accessibility to the online environment to perform self-verification on a minimal, decisive set of snapshots. Such evidences are provided as the sole materials for a general LLM-as-a-Judge verifier to determine their validity and relevance. Experiments on mobile tasks across model families and scales demonstrate that our SmartSnap paradigm allows training LLM-driven agents in a scalable manner, bringing performance gains up to 26.08% and 16.66% respectively to 8B and 30B models. The synergizing between solution finding and evidence seeking facilitates the cultivation of efficient, self-verifying agents with competitive performance against DeepSeek V3.1 and Qwen3-235B-A22B.

Community

Paper submitter

We introduce SmartSnap, a paradigm shift that transforms GUI agentsπŸ“±πŸ’»πŸ€– from passive task executors into proactive self-verifiers. By empowering agents to curate their own evidence of success through the 3C Principles (Completeness, Conciseness, Creativity), we eliminate the bottleneck of expensive post-hoc verification while boosting reliability and performance on complex mobile tasks.

SmartSnap redefines the agent's role through a unified policy that handles both task execution and evidence curation. Instead of burdening verifiers with verbose, noisy interaction trajectories, agents learn to select minimal, decisive snapshot evidences from their tool interactions. The framework leverages:

  • Augmented MDP: Agents operate in an extended action space βŠ• consisting of execution actions (click, type, etc.) and curation actions (submit evidence indices)
  • Dual-objective training: GRPO-based RL optimizes for both task completion and evidence quality
  • Dense reward shaping: Multi-component rewards $R_{format}$ + $R_{validity}$ + $R_{complete}$ + $R_{concise}$ guide agents toward becoming effective self-verifiers
  • Creative evidence generation: Agents proactively execute additional actions post-task to capture robust proof when needed

The approach achieves up to 26.08% absolute performance gains on AndroidLab across model scales, matching or exceeding much larger models like DeepSeek-V3.1 and Qwen3-235B-A22B.

Paper submitter

agent_self_verification_v9_html_css

Sign up or log in to comment

Models citing this paper 4

Datasets citing this paper 2

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2512.22322 in a Space README.md to link it from this page.

Collections including this paper 1