Self-Correction as Feedback Control: Error Dynamics, Stability Thresholds, and Prompt Interventions in LLMs
Aofan Liu, Jingxiang Meng
Abstract
Iterative self-correction is increasingly deployed in agentic LLM systems, yet whether repeated refinement improves or degrades performance remains inconsistent across models. We recast self-correction as a closed-loop feedback-control problem in which the same model is both controller and plant, and analyze its error dynamics via a two-state Markov model over {Correct, Incorrect}, parameterized by the Error Introduction Rate (EIR) and Error Correction Rate (ECR). The model yields a directly measurable stability threshold -- iterate only when ECR/EIR > Acc/(1-Acc) -- in which EIR acts as a stability margin and prompting becomes lightweight controller design. Empirically, across 7 models and 3 datasets (GSM8K, MATH, StrategyQA), a sharp near-zero EIR boundary (< 0.5%) cleanly separates beneficial from harmful self-correction: only o3-mini (+3.4 pp), Claude Opus 4.6 (+0.6 pp), and o4-mini (+/-0 pp) stay non-degrading, while GPT-5 and four others lose accuracy. A verify-first prompt intervention then provides causal evidence: it drives GPT-4o-mini's EIR from 2% to 0% and converts a -6.2 pp degradation into +0.2 pp (paired McNemar, p<10^{-4}), with negligible change on already-sub-threshold models -- exactly as the diagnostic predicts. A complementary analysis of adaptive self-consistency (ASC) shows it halts harmful refinement at a 3.8 pp confidence-elicitation cost, exposing a two-tier capability structure: prompt-level EIR suppression prevents degradation, whereas ECR enhancement -- plausibly training-level -- is required for genuine gains. Self-correction should thus be treated not as a default behavior but as a control decision governed by measurable error dynamics.
AI Impact Assessments
(1 models)Scientific Impact Assessment
1. Core Contribution
This paper recasts LLM iterative self-correction as a closed-loop feedback control problem, modeling correctness evolution as a two-state Markov chain parameterized by Error Introduction Rate (EIR) and Error Correction Rate (ECR). The central insight is an operationalizable stability threshold: self-correction is beneficial only when ECR/EIR > Acc/(1−Acc). The paper identifies near-zero EIR (≲0.5%) as the sharp empirical boundary separating beneficial from harmful self-correction, validates this causally with a "verify-first" prompt intervention, and articulates a two-tier capability model distinguishing prompt-level EIR suppression from training-level ECR enhancement.
The core novelty is not the Markov formalism itself—two-state Markov chains are elementary—but rather its operationalization into a practical diagnostic. The EIR/ECR decomposition provides a more actionable lens than simply tracking accuracy curves, and the verify-first intervention demonstrates that this framing has engineering utility.
2. Methodological Rigor
Strengths: The experimental design is reasonably thorough: 7 models across 4 capability tiers, 3 datasets, and 4 refinement iterations. The verify-first ablation is well-designed as a causal probe—it targets EIR specifically and produces the predicted differential effect (large impact on high-EIR models, negligible on low-EIR models), with appropriate statistical testing (McNemar's test, paired bootstrap CIs).
Concerns: Several methodological issues weaken the claims:
3. Potential Impact
The paper addresses a genuinely practical problem: deciding when to deploy self-correction loops in production LLM systems. The EIR-based diagnostic is simple to compute and could inform deployment decisions in agentic systems. The verify-first prompt is a low-cost intervention that practitioners could immediately adopt.
The two-tier capability model (EIR suppression vs. ECR enhancement) provides useful conceptual vocabulary for the field, cleanly distinguishing what prompt engineering can achieve from what requires training-level changes. This framing could influence how researchers think about self-correction capabilities.
However, the impact may be bounded by several factors: (1) the rapidly evolving model landscape means specific EIR thresholds may become outdated quickly; (2) the binary {Correct, Incorrect} state space limits applicability to open-ended generation; (3) the practical recommendation ("measure EIR on a calibration set before deploying self-correction") is somewhat obvious once stated; (4) the connection to control theory, while metaphorically appealing, doesn't leverage actual control-theoretic tools beyond the basic Markov formulation.
4. Timeliness & Relevance
The paper is highly timely. Agentic LLM systems with self-correction loops are proliferating rapidly, and the question of when self-correction helps versus hurts is practically urgent. The inclusion of very recent models (GPT-5, Claude Opus 4.6, o4-mini) enhances relevance. The finding that GPT-5 degrades under self-correction despite frontier capability is particularly noteworthy and counter-intuitive.
The paper also connects to the broader test-time compute scaling literature, where understanding when to allocate compute to refinement versus other strategies (like self-consistency) is an active research question.
5. Strengths & Limitations
Key Strengths:
Key Limitations:
Additional Observations
The paper's comparison showing Self-Consistency outperforming iterative refinement at matched compute (93.4% vs. 86.6%) is a useful practical finding, though not novel. The "accuracy-correction paradox" terminology effectively captures an important phenomenon, though the underlying pool-size asymmetry argument has been informally noted in prior work.
The paper would benefit from a more honest positioning: it is primarily an empirical contribution with a simple but useful mathematical framework, rather than a deep theoretical contribution to either control theory or Markov chain analysis.
Generated May 5, 2026
Comparison History (29)
Paper 2 provides a fundamental theoretical framework (control theory) to explain and predict LLM self-correction, addressing a major inconsistency in the field. By establishing a mathematically grounded and empirically validated stability threshold, it offers a rigorous, actionable metric for system design. This bridges theoretical analysis with practical application, likely driving broader and more foundational impact than the architectural improvements proposed in Paper 1.
Paper 1 offers a rigorous, theoretically grounded framework by mapping LLM self-correction to a feedback control problem. By establishing measurable stability thresholds (EIR/ECR), it resolves inconsistencies in current literature regarding when self-correction actually works. Its combination of mathematical modeling, causal empirical testing, and actionable diagnostics gives it exceptional methodological rigor and broad applicability, likely making a larger fundamental impact than the architectural approach of Paper 2.
Paper 1 provides a novel theoretical framework by applying closed-loop feedback control to LLM self-correction. By establishing measurable stability thresholds (EIR/ECR), it offers a principled explanation for inconsistent self-correction performance across models. This fundamental theoretical insight, combined with strong empirical validation, has broad implications for the design and evaluation of autonomous AI agents, likely yielding a higher and more enduring scientific impact than the specific architectural intervention proposed in Paper 2.
Paper 2 introduces a novel theoretical framework (feedback control theory applied to LLM self-correction) with broad applicability across the rapidly growing field of agentic AI systems. It provides a measurable stability threshold, actionable diagnostic criteria, and causal evidence via prompt interventions—offering both theoretical depth and practical guidance. Paper 1, while technically solid, is primarily an engineering optimization study for a specific ASR deployment scenario with incremental contributions. Paper 2's cross-cutting insights into when self-correction helps vs. harms have wider implications for LLM system design, making it more likely to influence future research directions.
Paper 2 likely has higher impact due to timeliness and breadth: it targets widely deployed LLM agent behaviors and provides a simple, measurable stability criterion (ECR/EIR threshold) with strong empirical validation across multiple models/datasets plus causal prompt intervention evidence. The feedback-control framing is novel and potentially general across iterative reasoning, tool use, and agent loops, enabling practical diagnostics and design guidance. Paper 1 is rigorous and valuable but more domain-specific (chess, human-in-the-loop overrides) with narrower immediate cross-field relevance.
Paper 1 offers a clearer methodological contribution: a formal feedback-control framing with a simple, testable Markov error-dynamics model that yields an explicit stability threshold and actionable diagnostics (EIR/ECR) validated across models/datasets, plus a causal prompt intervention with strong statistics. This combination of theory + measurable quantities + prescriptive guidance is likely to generalize widely across LLM agent design and evaluation. Paper 2 is timely and potentially broad, but appears more exploratory/empirical with a metaphor-to-architecture mapping that may be harder to formalize and reproduce as a general scientific principle.
Paper 1 likely has higher impact due to a substantial, reusable benchmark artifact (300 tasks, trajectory-aware evidence, fine-grained rubrics) that can become community infrastructure for evaluating agent reliability, safety, and robustness across modalities—high real-world relevance and broad applicability. Its methodological contribution (multi-channel auditing, multi-trial metrics) addresses pressing evaluation failures in deployed agents. Paper 2 is novel and timely with a clear theoretical framing and actionable prompting insight, but its scope is narrower (self-correction on a few NLP benchmarks) and may have less cross-field and tooling impact than a widely adoptable evaluation suite.
Paper 1 likely has higher impact: it delivers a substantial, reusable evaluation infrastructure (300 tasks, trajectory-aware evidence, 2,159 rubric items) directly addressing urgent gaps in agent benchmarking (safety/robustness, multimodality, interaction paradigms). This enables broad, cross-model and cross-lab comparability and can influence deployment standards and research directions across AI safety, HCI, and agent systems. Paper 2 is novel and rigorous with a useful control-theoretic framing, but its scope is narrower (self-correction on select datasets) and may translate more as a diagnostic/prompting guideline than a field-wide benchmark resource.
Paper 2 addresses a critical open problem in LLMs (inconsistent self-correction) by elegantly framing it as a feedback control problem. By establishing measurable stability thresholds (EIR/ECR), it provides both a strong theoretical foundation and actionable prompt interventions. This novel cross-disciplinary approach will likely broadly influence how agentic loops are designed and evaluated, offering deeper scientific impact than Paper 1's practical but more conventional cost-routing POMDP framework.
Paper 2 likely has higher impact due to broad, immediate applicability: it introduces a benchmark (AgentFloor) and large-scale evaluation corpus that can standardize comparisons, drive model routing decisions, and influence both research and production agent design. Releasing tasks, harness, sweeps, and runs increases reuse and citation potential across academia and industry. While Paper 1 offers a novel control-theoretic framing and useful diagnostic for self-correction, its scope is narrower (self-correction dynamics) and may be less broadly adopted than a widely usable benchmark for tool-using agents.
Paper 2 is more likely to have higher scientific impact: it introduces a novel, general feedback-control framing of LLM self-correction with a simple, testable stability criterion, validated across multiple models/datasets with causal prompt interventions and statistical testing. Its applications are immediate for agentic LLM pipelines, evaluation, and safety/reliability, and the insights generalize across tasks and model families. Paper 1 is ambitious and practically relevant, but resembles scaling/aggregation of existing offline RL + transformer ideas; impact depends heavily on reproducibility, compute access, and whether it materially advances MARL beyond dataset scale.
Paper 2 provides a novel theoretical framework recasting LLM self-correction as feedback control, yielding a measurable stability threshold (ECR/EIR > Acc/(1-Acc)) that offers actionable diagnostic criteria. Its interdisciplinary contribution bridging control theory and LLM behavior, empirical validation across multiple models/datasets, and causal prompt intervention evidence make it broadly impactful. It addresses a fundamental question about when self-correction helps vs. harms, relevant to all agentic LLM deployments. Paper 1, while practically useful for multi-agent infrastructure optimization, is more incremental engineering with narrower speedup improvements.
Paper 1 has higher likely impact due to strong methodological rigor and immediate applicability to a fast-moving, high-stakes domain (LLM agents). It introduces a concrete, testable control-theoretic model with measurable quantities (EIR/ECR), derives an actionable stability threshold, and provides multi-model, multi-dataset evidence plus causal intervention with statistical testing. Its guidance directly affects how practitioners design self-correction/verification loops, with broad relevance to reliability, safety, and evaluation. Paper 2 is ambitious and cross-disciplinary, but such unification frameworks often face harder empirical falsification and slower uptake unless the validations are exceptionally definitive.
Paper 2 offers a fundamental paradigm shift by formalizing LLM self-correction as a feedback-control problem. By introducing quantifiable metrics (EIR, ECR) and a mathematical stability threshold, it resolves widespread empirical inconsistencies in agentic LLM performance. This theoretical grounding, validated across top-tier models, provides actionable insights for prompt engineering and model training. While Paper 1 presents a solid technical solution for LoRA composition, Paper 2 addresses a ubiquitous, highly debated problem in reasoning and autonomous agents, guaranteeing wider interdisciplinary and practical impact across the rapidly growing field of agentic AI.
Paper 1 establishes fundamental computational complexity results (NP-hardness, #P-hardness) for exact conditioning in autoregressive models, providing theoretical foundations that will remain relevant as long as these models are used. These hardness results formalize widely-held intuitions and have broad implications across NLP, music generation, and any constrained generation task. Paper 2 offers a useful practical framework for self-correction but is more incremental—recasting an empirical phenomenon via a simple Markov model with limited novelty. Paper 1's theoretical contributions have broader, more lasting impact across multiple fields.
Paper 2 addresses a critical, timely issue in AI—LLM self-correction degradation—using a highly novel and rigorous control-theory framework. Its theoretical and empirical contributions have broad applicability across the rapidly expanding field of agentic AI. In contrast, Paper 1 offers a valuable but more niche methodological contribution tied to specific data management ecosystems within battery materials science, limiting its cross-disciplinary impact.
Paper 1 is likely higher impact: it introduces a principled control-theoretic framing of LLM self-correction with a measurable stability threshold (ECR/EIR) that generalizes across tasks/models and yields actionable interventions (verify-first) with strong causal evidence. The methodology links theory to empirical diagnostics and provides broadly applicable guidance for agentic LLM design, affecting reliability, evaluation, and deployment across many domains. Paper 2 is timely and useful for political multi-agent pipelines, but its scope is narrower and more application-specific, with less cross-field generality than a feedback-control stability framework.
Paper 2 offers a highly novel theoretical framework by casting LLM self-correction as a feedback-control problem, addressing a major inconsistency in agentic systems. Its derivation of measurable stability thresholds provides deep foundational insights into error dynamics, applicable across all LLMs. While Paper 1 presents a strong practical method for process rewards in specific domains, Paper 2's theoretical grounding and causal analysis of self-correction have broader, field-wide implications for AI reliability and agent design.
Paper 2 likely has higher impact: it introduces a new user-facing paradigm (interactive unlearning at inference time) plus a concrete, efficient method (STAMP with low-rank acceleration) that enables practical on-device model editing—highly timely given privacy, safety, and regulatory pressures. The approach has broad applications (data erasure, misinformation, harmful content) and crosses security/privacy/ML systems. Paper 1 offers a valuable theoretical framing and diagnostic for self-correction, but its primary contribution is analysis and prompting guidance with narrower downstream leverage compared to scalable unlearning capabilities.
Paper 1 offers a more novel and rigorous theoretical framework by recasting LLM self-correction as a feedback-control problem with a measurable stability threshold (ECR/EIR), validated across 7 models and 3 datasets with causal evidence from prompt interventions. It provides actionable, principled guidance for a widely-used agentic LLM technique. Paper 2 provides useful diagnostics for prompt optimization but addresses a narrower scope with more incremental insights (interaction effects are null, success depends on output format). Paper 1's control-theoretic framing has broader interdisciplinary appeal and deeper implications for LLM system design.