Do Self-Evolving Agents Forget? Capability Degradation and Preservation in Lifelong LLM Agent Adaptation
Ye Yu, Xiaopeng Yuan, Haibo Jin, Heming Liu, Yaoning Yu, Haohan Wang
Abstract
Recent advances in LLM agents enable systems that autonomously refine workflows, accumulate reusable skills, self-train their underlying models, and maintain persistent memory. However, we show that such self-evolution is often non-monotonic: adapting to new task distributions can progressively degrade previously acquired capabilities across all major evolution channels. We identify this phenomenon as \emph{capability erosion under self-evolution} and show that it consistently emerges across workflow, skill, model, and memory evolution. To mitigate this issue, we propose \emph{Capability-Preserving Evolution} (CPE), a general stabilization principle that constrains destructive capability drift during continual adaptation. Across all four evolution dimensions, CPE consistently improves retained capability stability while preserving adaptation performance. For example, in workflow evolution, CPE improves retained simple-task performance from 41.8\% to 52.8\% under GPT-5.1 optimization while simultaneously achieving stronger complex-task adaptation. Our findings suggest that stable long-horizon self-evolving agents require not only acquiring new capabilities, but also explicitly preserving previously learned ones during continual adaptation.
AI Impact Assessments
(1 models)Scientific Impact Assessment
Core Contribution
This paper identifies and formalizes "capability erosion under self-evolution" — the phenomenon where self-evolving LLM agents progressively lose previously acquired capabilities as they adapt to new task distributions. The authors frame this as occurring across four evolution dimensions (workflow, skill/tool, model, and memory), propose a unified interference-based explanation, and introduce Capability-Preserving Evolution (CPE) as a general stabilization principle with dimension-specific instantiations.
The conceptual contribution is essentially transplanting the well-studied catastrophic forgetting problem from continual learning into the broader context of self-evolving agents, where mutable state extends beyond model parameters to include workflows, skill repositories, and memory stores. This reframing is the paper's primary intellectual contribution.
Methodological Rigor
Theoretical Framework: The formalization in Section 2-3 is clean but relatively straightforward. Proposition 1 is a direct Taylor expansion showing that gradient updates along directions with positive curvature in the old-task Hessian increase old-task loss — this is essentially a restatement of well-known continual learning theory. Proposition 2 shows that quadratic regularization suppresses this effect, which is similarly well-established. The theoretical contribution is more organizational than novel.
Experimental Design: The experiments span four dimensions but exhibit several concerns:
1. Workflow evolution (τ2-Bench + EvoAgentX): The simple/complex task split relies on an LLM classifier, which is validated but introduces subjectivity. Results show CPE improving simple-task retention from 41.8% to 52.8% with GPT-5.1, which is meaningful but absolute performance remains modest.
2. Skill/tool evolution (MATH + MemSkill-style): Improvements are relatively small (e.g., 82.6% → 84.4% on Algebra with GPT-4o mini). The bounded repository capacity of 30 skills creates an artificial constraint that exacerbates forgetting — it's unclear how results would change with larger repositories.
3. Model evolution (MedMCQA + STaR + LoRA): This is essentially standard EWC applied to LoRA continual fine-tuning. Improvements are marginal (29.4% → 30.5% on Anatomy with Qwen3-0.6B). The baseline performance is very low, raising questions about whether the self-training pipeline itself is effective.
4. Memory evolution (Dynamic Cheatsheet): Average retention gap reduces from 2.3% to 0.7%, which is modest. The experimental setup with only 30 training problems per domain is quite small-scale.
Missing elements: There are no error bars, confidence intervals, or statistical significance tests on any results (except the task split validation). Reproducibility would benefit from these. The paper also lacks ablation studies on the regularization strength λ across dimensions.
Potential Impact
The paper's greatest potential impact is conceptual rather than technical. By framing capability erosion as a cross-cutting concern across all self-evolution channels, it draws attention to an important systems-level challenge that individual evolution frameworks typically ignore. This framing could influence how future self-evolving agent systems are designed and evaluated.
However, the technical solutions are largely adaptations of existing continual learning techniques (EWC for model evolution, capacity management heuristics for skill/memory, anchor constraints for workflows). Practitioners working on any individual dimension would likely already be aware of analogous forgetting problems. The paper's value proposition thus depends on whether the unified perspective genuinely enables insights that dimension-specific analysis would miss.
The practical applicability is limited by the controlled, sequential task-shift setup. Real-world self-evolving agents encounter much more complex, non-stationary, and interleaved evolution trajectories. The paper acknowledges this limitation but doesn't address it.
Timeliness & Relevance
The paper is well-timed. Self-evolving agents (Voyager, EvoAgentX, MemSkill, etc.) are a rapidly growing research area, and the stability question is genuinely underexplored. The survey by Gao et al. (2026) that the authors cite focuses on forward progress, and this paper complements it by highlighting the retention side. The use of very recent models (GPT-5.1, GPT-5 nano, Qwen3) demonstrates engagement with the current frontier.
Strengths
1. Breadth of coverage: Studying all four evolution dimensions in a single paper provides a convincing case that capability erosion is a general phenomenon, not dimension-specific.
2. Clear problem identification: The three-dimensional characterization (retrospective decay, behavioral policy drift, generalization erosion) is well-articulated.
3. Practical relevance: As agents become more autonomous, understanding their failure modes during self-evolution is increasingly important.
4. Unified theoretical framing: While not technically deep, the shared formalization across dimensions is pedagogically valuable.
Limitations
1. Limited technical novelty: CPE instantiations are straightforward adaptations of known techniques — EWC for parameters, capacity management for repositories, anchor constraints for workflows. The "general principle" amounts to "add regularization to prevent forgetting," which is the core insight of continual learning from nearly a decade ago.
2. Modest empirical improvements: Many improvements are small (1-3 percentage points for model and skill evolution), and without statistical significance testing, it's hard to assess reliability.
3. Artificial experimental setup: Sequential domain shifts with clean boundaries don't reflect realistic agent deployment. The controlled setting makes the problem easier to study but limits generalizability claims.
4. Scalability unclear: Experiments use relatively small models (0.6B, 3B parameters), small skill repositories (30 entries), and small training sets (30-1000 examples per domain).
5. No comparison with other continual learning methods: Only vanilla vs. CPE is compared. How do other established continual learning techniques (progressive nets, replay, etc.) perform?
6. The "unified" perspective is somewhat superficial: While the mathematical formalism looks unified, the actual CPE implementations are entirely different across dimensions, connected only by the general idea of "constrain updates to preserve old capabilities."
Overall Assessment
This paper makes a useful conceptual contribution by systematically documenting capability erosion across multiple self-evolution channels and raising awareness of this failure mode in the self-evolving agents community. However, the technical contributions are incremental — the theoretical analysis reformulates known continual learning principles, and the practical solutions are straightforward adaptations of existing techniques. The empirical evaluation, while broad, lacks depth in terms of statistical rigor, ablations, and comparisons with alternative approaches. The paper is best understood as a position-and-empirical-evidence paper that opens a research direction rather than one that provides definitive solutions.
Generated May 12, 2026
Comparison History (20)
Paper 1 addresses a fundamental and broadly applicable problem—capability erosion in self-evolving LLM agents—that affects the entire rapidly growing field of autonomous AI systems. It identifies a novel phenomenon across multiple evolution dimensions and proposes a general mitigation framework (CPE). Its breadth of impact spans all LLM agent applications, making it highly timely and relevant. Paper 2, while technically strong with impressive error reductions in materials science, addresses a narrower domain-specific problem (crystal slab generation) with more limited cross-field applicability.
Paper 2 addresses a fundamental and broadly applicable problem—capability erosion in self-evolving LLM agents—that affects all major evolution channels (workflow, skill, model, memory). This identifies a new phenomenon with clear parallels to catastrophic forgetting but in a novel agent context, making it highly relevant as autonomous agents become widespread. Paper 1, while technically rigorous with its prefix-aware internal reward model, addresses a more specific optimization challenge (credit assignment in GRPO training). Paper 2's broader scope, timeliness given the rise of autonomous agents, and cross-cutting implications give it higher potential impact.
Paper 1 addresses a fundamental and underexplored problem in the rapidly growing field of self-evolving LLM agents—capability erosion during continual adaptation. It identifies a systematic phenomenon across four evolution dimensions and proposes a general mitigation framework (CPE), offering broad technical impact across AI/ML research. Paper 2 provides valuable empirical insights on AI productivity heterogeneity via an RCT, but its findings (skill-dependent gains, scaffolding helps) are more incremental and domain-specific. Paper 1's novelty, breadth of technical contribution, and timeliness in the fast-moving agents space give it higher potential impact.
Paper 2 likely has higher impact due to timeliness and broad applicability to rapidly emerging self-improving LLM agent systems. It identifies a concrete, general failure mode (capability erosion) across multiple adaptation channels and proposes a unifying mitigation principle (CPE) with quantitative gains, suggesting actionable guidance for real-world agent deployment and continual learning research. Paper 1 is a valuable unifying framework for fairness-of-explanations, but as a primarily theoretical/survey contribution its immediate methodological and deployment impact may be less direct than an empirically demonstrated stabilization approach for lifelong agents.
Paper 1 introduces a novel conceptual framework (model-adaptive tool necessity) and reveals a fundamental 'knowing-doing gap' in LLM tool use through rigorous mechanistic analysis of hidden states. The discovery that cognition and action probes become orthogonal in late layers provides deep interpretability insights with broad implications for agent design. Paper 2 addresses capability erosion in self-evolving agents—an important but more incremental contribution that parallels well-known catastrophic forgetting in continual learning. Paper 1's mechanistic findings are more novel and likely to inspire new research directions across LLM agent and interpretability communities.
Paper 1 addresses a fundamental bottleneck in AI research—catastrophic forgetting in lifelong LLM agents. Its proposed solution, CPE, spans multiple evolution channels (workflow, skill, model, memory), promising broad applicability across the rapidly growing field of autonomous agents. While Paper 2 presents an excellent domain-specific application of foundation models for power grids, Paper 1 has significantly broader cross-disciplinary potential and relevance to the immediate trajectory of general AI development.
Paper 2 identifies a fundamental and previously underexplored problem—capability erosion in self-evolving LLM agents—that cuts across multiple evolution dimensions (workflow, skill, model, memory). This broad, principled finding has wider implications for the rapidly growing field of autonomous AI agents and connects to classical continual learning challenges. Its proposed CPE framework is general-purpose and applicable across agent architectures. Paper 1, while solid, addresses a more specific problem (long document understanding) with incremental engineering contributions. Paper 2's timeliness and breadth of impact across the agent ecosystem give it higher potential scientific influence.
Paper 1 addresses a highly timely and critical bottleneck in LLM agent development: catastrophic forgetting during lifelong self-evolution. Given the massive current focus on autonomous AI agents, its proposed capability-preserving framework has immediate, widespread applicability and high citation potential. While Paper 2 presents a novel neurosymbolic approach to explainability, Paper 1's focus on fundamental LLM agent stability is likely to have a broader and more immediate impact across the AI community.
Paper 1 addresses a fundamental barrier in the development of autonomous, lifelong-learning agents: capability degradation (or catastrophic forgetting) across multiple evolutionary dimensions. By identifying this phenomenon and proposing a general stabilization principle, it provides a crucial foundational step for long-horizon agent autonomy. Paper 2 offers a practical and efficient compute-optimization technique, but its scope is narrower and more incremental compared to the broad, systemic implications of solving lifelong learning degradation in Paper 1.
Paper 1 addresses a fundamental theoretical and practical bottleneck in lifelong learning for LLM agents (capability erosion) and introduces a general stabilization principle. Its insights into continuous agent evolution have broad implications for the development of autonomous, self-improving AI systems. While Paper 2 provides a valuable and highly practical benchmark for GUI agents, Paper 1 offers deeper methodological innovation and addresses a core challenge that spans multiple dimensions of AI evolution, giving it a higher potential for foundational scientific impact.
Paper 1 targets a timely, widely observed bottleneck in self-improving LLM agents—catastrophic/gradual forgetting during continual self-evolution—and proposes a general stabilization principle (CPE) shown across multiple evolution channels (workflow, skill, model, memory). Given the rapid deployment of adaptive agents, its applications are immediate and broad across agent frameworks, continual learning, and AI safety/reliability. Paper 2 is methodologically rigorous and novel in formal methods, but its impact is likely narrower to verification/synthesis communities. Overall, Paper 1 has higher near-term, cross-field impact potential.
Paper 2 likely has higher impact: it identifies a broad, fundamental failure mode in lifelong self-evolving agents (capability erosion) spanning multiple evolution channels (workflow, skills, model, memory) and proposes a general mitigation principle (CPE) with sizable empirical gains, making it widely applicable beyond coding agents. Its relevance is high for deploying long-horizon adaptive agents safely and reliably. Paper 1 offers useful design insights for cross-domain memory in coding benchmarks, but its demonstrated gains are modest and the scope is narrower.
Paper 1 addresses a fundamental and broadly relevant problem—capability degradation in self-evolving LLM agents—that affects the rapidly growing field of autonomous AI agents. It identifies a novel phenomenon (capability erosion under self-evolution) across multiple evolution dimensions and proposes a general mitigation principle (CPE). Given the explosive interest in LLM agents, this work is highly timely and has broad applicability. Paper 2, while technically rigorous in combining game theory with federated learning and synthetic data, addresses a more niche intersection of topics with narrower immediate impact.
Paper 1 addresses a fundamental bottleneck in autonomous AI systems: catastrophic forgetting in lifelong learning LLM agents. By identifying 'capability erosion' across multiple channels and proposing a general mitigation strategy, it offers broad methodological impact for developing self-evolving AGI architectures. Paper 2 provides a valuable diagnostic tool for LLM calibration, but its focus is narrower. The broader scope, architectural implications, and focus on the rapidly growing field of lifelong agent adaptation give Paper 1 higher potential for widespread scientific impact.
Paper 1 targets a foundational, broadly relevant problem for long-horizon autonomous LLM agents: non-monotonic self-improvement and catastrophic-like capability erosion across workflow/skills/model/memory. Its unifying diagnosis plus a general stabilization principle (CPE) applicable across multiple adaptation channels suggests wider cross-field impact (continual learning, agent systems, RLHF, safety/reliability) and strong real-world relevance as agents become persistent and self-modifying. Paper 2 is timely and useful for audio-visual LLM robustness, but its scope is narrower to AVQA and modality-specific reasoning, limiting breadth compared to Paper 1’s agent-centric generality.
Paper 2 addresses a critical and highly timely challenge (capability erosion) in self-evolving LLM agents. Given the massive industry and academic focus on autonomous, self-improving AI systems, identifying and mitigating this issue has profound implications for AI safety, continual learning, and practical agent deployment. While Paper 1 offers strong architectural improvements for sequence modeling, Paper 2's focus on the fundamental stability of lifelong learning agents promises broader, more immediate real-world applications and multi-disciplinary impact.
Paper 1 targets a broadly important and timely problem—capability degradation in lifelong, self-evolving LLM agents—across multiple adaptation channels (workflow, skills, model, memory), and proposes a general stabilization principle (CPE) with cross-dimensional empirical validation. This has wide real-world applicability for deployed continual-learning agents and connects to foundational issues like continual learning/catastrophic forgetting, making its impact potentially broad across AI safety, agentic systems, and MLOps. Paper 2 is valuable but appears more benchmark- and framework-specific to puzzle-style reasoning, with narrower applicability.
Paper 2 introduces a novel formal framework (CCRM) for a widely observed but previously unformalized phenomenon in LLM agent pipelines. It provides rigorous mathematical treatment with closed-form results, information-theoretic bounds, and strong empirical validation on SWE-bench (fitting with <0.001 error). The clean theoretical contributions (5 formal results) offer immediately actionable insights for practitioners designing retry strategies. Paper 1 addresses an important problem (capability erosion) but is more empirical/incremental, extending known catastrophic forgetting concepts to LLM agents. Paper 2's mathematical rigor and practical applicability give it broader and more lasting impact.
Paper 1 introduces a concrete, generalizable phenomenon (capability erosion in self-evolving agents) and proposes an actionable mitigation principle (CPE) validated across multiple evolution channels (workflow, skill, model, memory). This yields direct implications for building reliable lifelong agents and connects to continual learning/stability-plasticity, likely prompting follow-up methods and benchmarks. Paper 2 makes an important evaluation argument with taxonomy and recommendations, but is more normative/methodological and may yield slower, practice-dependent uptake. Overall, Paper 1 has higher novelty and clearer algorithmic leverage with broad applicability.
Paper 1 targets a timely, broadly relevant problem in LLM-based autonomous agents: non-monotonic self-improvement and catastrophic-like forgetting across workflow/skills/model/memory. Its unified framing (“capability erosion under self-evolution”) plus a general mitigation principle (CPE) could influence agent design, continual learning, and safety/alignment practices across many applications. Paper 2 is solid and practical for resource-constrained MARL, but is narrower (benchmark-centric, distillation in MARL is established). Overall, Paper 1 has higher cross-field breadth and near-term relevance given rapid deployment of self-adapting LLM agents.