From Fact Overwriting to Knowledge Evolution: Causal Editing via On-Policy Self-Distillation

Shuaike Li, Kai Zhang, Xianquan Wang, Jiachen Liu, Shengpeng Mo

May 27, 2026

arXiv:2605.28303v1 PDF

cs.AI(primary)

#579of 2682·Artificial Intelligence

#579 of 2682 · Artificial Intelligence

Tournament Score

1471±48

10501800

70%

Win Rate

Wins

Losses

Matches

Rating

7.2/ 10

Significance7.5

Rigor6.8

Novelty7.5

Clarity7.5

Tournament Score

1471±48

10501800

70%

Win Rate

Wins

Losses

Matches

Rating

7.2/ 10

Significance

Rigor

Novelty

Clarity

Abstract

While Knowledge Editing (KE) enables efficient updates, its dominant Static Fact Overwriting paradigm treats LLMs as discrete databases, forcibly injecting isolated facts. Fracturing pre-trained logical topologies, this triggers Epistemic Dissonance -- a pathology where un-evolved legacy priors force the model to explicitly negate the injected update. Idealized interventions reveal that this is an inherent structural flaw rather than mere algorithmic noise, with a zero-distortion proxy yielding a catastrophic 95.6% self-refutation rate. Given the causally driven nature of real-world knowledge, grounding updates in explicit causal narratives effectively collapses this conflict rate to just 6.6%, underscoring the imperative for a paradigm shift toward Causal Editing. To internalize this evolution, we propose CODE (Causal On-policy Distillation for Editing). By coupling causal bootstrapping with asymmetric on-policy distillation, CODE engraves causal transition logic directly into parametric memory. Experiments on LLaMA-3.1 and Qwen-2.5 show CODE drastically suppresses self-refutation to 1.8% while securing robust multi-hop accuracy (up to 83.5%), seamlessly transforming discrete fact injection into coherent knowledge evolution. Code is available at https://github.com/CrashBugger/CODE.

AI Impact Assessments

(1 models)

Scientific Impact Assessment: "From Fact Overwriting to Knowledge Evolution: Causal Editing via On-Policy Self-Distillation"

1. Core Contribution

This paper identifies and formalizes a critical failure mode in knowledge editing called Epistemic Dissonance — where an edited LLM initially retrieves the injected fact but then explicitly negates it due to conflicting legacy priors. The key insight is that this is not merely an algorithmic artifact but a structural flaw of the "Static Fact Overwriting" paradigm itself, demonstrated through an elegant Force-decode intervention that achieves 95.6% self-refutation even on an unaltered model. The proposed solution, CODE, reframes knowledge editing as causal narrative internalization via two-stage on-policy self-distillation: (1) Causal Bootstrapping via SFT on teacher-sampled trajectories, and (2) Causal Internalization via asymmetric on-policy KL-divergence minimization between an open-book teacher (with causal scaffold) and closed-book student.

2. Methodological Rigor

Strengths in experimental design:

The Force-decode experiment is methodologically clever — by showing that even a pristine model with perfect recall suffers catastrophic self-refutation, it isolates the paradigm limitation from algorithmic noise. This is a clean ablation of the problem.

The Self-Refutation Rate (SRR) metric with its two-tier probing (direct + adversarial) is well-motivated, though the adversarial probe is admittedly a stress test rather than a naturalistic evaluation.

Comprehensive ablation studies systematically validate each component (causal scaffold, bootstrapping, internalization, KL direction).

Evaluation across two model families (LLaMA-3.1, Qwen-2.5) and two dataset types (counterfactual, temporal) strengthens generalizability claims.

Concerns:

The causal narratives are synthetically generated by DeepSeek-V3, introducing a dependency on an external powerful model. While the authors argue real-world updates naturally come with causal documentation, this assumption may not always hold — many factual updates lack neat causal explanations (e.g., statistical corrections, measurement updates).

The SRR metric relies on LLM-as-a-Judge (DeepSeek-V4-Flash), creating potential circularity and evaluation variance. Though the detection prompt is well-specified, no inter-annotator agreement or human validation is reported for this core metric.

The paper evaluates only on MQuAKE variants. Testing on other KE benchmarks (e.g., CounterFact, zsRE) would strengthen claims about paradigm-level generality.

The comparison with CaKE is somewhat uneven — CaKE uses multi-hop data augmentation, and while the authors create an augmented CODE variant for fair comparison, CaKE's higher M-Acc on Qwen-2.5 (87.8% vs 77.3% at 2-hop) is underemphasized. The trade-off between accuracy and SRR deserves more nuanced discussion.

3. Potential Impact

Theoretical impact: The paper makes a conceptually important argument — that LLMs should not be treated as key-value stores. The Epistemic Dissonance framework provides a useful diagnostic lens for the KE community. The Force-decode experiment could become a standard diagnostic tool for future KE methods.

Practical impact: CODE demonstrates that causal grounding substantially improves knowledge editing reliability. The batch editing efficiency (~55s per edit at batch 90 on a single RTX 4090) and general capability retention make it practically viable. The preservation (and even improvement) of reasoning benchmarks (GSM8k, BBH) after editing is notable.

Broader influence: The paper bridges knowledge editing with cognitive science concepts (causal mental models) and on-policy distillation, potentially inspiring cross-pollination. The framing of "knowledge evolution" vs. "fact overwriting" could reshape how the community thinks about model updates.

4. Timeliness & Relevance

This paper is highly timely. Recent works (Liu et al., 2025; Xie et al., 2025; Yang et al., 2025) have increasingly questioned the superficial success metrics of knowledge editing. The paper synthesizes these concerns into a coherent diagnosis (Epistemic Dissonance) and offers a constructive solution. The on-policy self-distillation approach leverages cutting-edge training paradigms (2024-2026 references), placing it at the frontier of both KE and distillation research.

5. Strengths & Limitations

Key Strengths:

Compelling problem identification: The Force-decode experiment is the paper's most impactful contribution — it definitively separates paradigm limitations from implementation issues.

Principled solution design: The asymmetric teacher-student architecture with information bottleneck (open-book vs. closed-book) is elegant and well-motivated.

Strong empirical gains: SRR reduction from ~30-50% to ~2-6% while maintaining or improving accuracy is substantial.

Thorough presentation: Extensive appendices with qualitative examples, prompt templates, and implementation details support reproducibility.

Rationale Alignment metric: Verifying that the model can explain *why* knowledge changed (83-98% RA) adds a meaningful dimension beyond accuracy.

Notable Weaknesses:

Scalability concerns: Only tested on 7-8B models; the authors acknowledge uncertainty about scaling to 70B+ where "topological inertia" may differ.

Causal narrative dependency: The requirement for plausible causal narratives limits applicability to knowledge updates that admit causal explanations. Not all factual corrections are causally motivated.

Limited benchmark diversity: Reliance solely on MQuAKE variants constrains the generalizability of findings.

Training cost: Despite amortization, CODE requires per-edit optimization (even if batched), making it fundamentally more expensive than locate-and-edit methods for single edits (~569s vs. seconds).

Ethical double-edge: The authors honestly acknowledge that making misinformation "topologically seamless" creates dual-use risks — this is a genuine concern worth monitoring.

Additional Observations

The paper's framing is occasionally hyperbolic ("paradigm shift," "engraves," "cognitive scaffold"), which somewhat oversells what is fundamentally a well-designed fine-tuning approach with causal data augmentation and on-policy distillation. The conceptual contribution (identifying the problem) may ultimately be more influential than the specific method, as the causal narrative requirement introduces constraints that future work will need to relax. The code release enhances reproducibility.

Rating:7.2/ 10

Significance 7.5Rigor 6.8Novelty 7.5Clarity 7.5

Generated May 28, 2026

Comparison History (20)

vs. Zipping the Thought: When and How Compressed Reasoning Data Works in LLM Post-Training

claude-opus-4.65/28/2026

Paper 1 addresses the fundamental and timely challenge of reasoning efficiency in LLMs through a rigorous, controlled experimental framework. Its systematic taxonomy of CoT compression types, combined with insights about SFT vs. RL dynamics, has broader applicability across the field. The findings about data scaling, memorization risks, and how RL decomposes compressed reasoning provide actionable guidance for practitioners. Paper 2 tackles an important but narrower problem in knowledge editing with a clever solution, but Paper 1's breadth of impact—spanning reasoning, post-training methodology, and data efficiency—gives it higher potential influence.

vs. SKILLC: Learning Autonomous Skill Internalization in LLM Agents via Contrastive Credit Assignment

gemini-3.15/28/2026

Paper 1 challenges a fundamental flaw in the dominant Knowledge Editing paradigm ('Epistemic Dissonance') and proposes a novel causal editing approach. This conceptual shift from static fact injection to coherent knowledge evolution has broad implications for maintaining LLM reliability, reasoning, and alignment post-update, offering potentially wider impact across the field than Paper 2's more specialized RL optimization technique for agent skill internalization.

vs. AIBuildAI-2: A Knowledge-Enhanced Agent for Automatically Building AI Models

claude-opus-4.65/28/2026

AIBuildAI-2 addresses the broader and more impactful problem of automating AI model development with a knowledge-enhanced agent system, demonstrating state-of-the-art results on established benchmarks (MLE-Bench). Its potential to democratize AI for scientific discovery across biology, physics, and chemistry gives it wider cross-disciplinary impact. While Paper 2 presents a novel contribution to knowledge editing with the CODE framework and addresses an important problem (epistemic dissonance), its scope is narrower, focusing specifically on LLM fact updating. Paper 1's practical applications and broader accessibility implications give it higher estimated impact.

vs. When Does Memory Help Multi-Trajectory Inference for Tool-Use LLM Agents?

gpt-5.25/28/2026

Paper 2 is more novel and potentially higher impact: it reframes knowledge editing from static fact overwriting to causal knowledge evolution, identifies and quantifies a structural failure mode (self-refutation/epistemic dissonance), and proposes a concrete method (CODE) with large reported gains across two major model families plus released code. Its applications (continual updating, safety/policy updates, domain knowledge maintenance) are broad and timely. Paper 1 is rigorous and useful for agent evaluation, but is primarily a disentangling/benchmarking study with narrower conceptual novelty and more limited cross-field reach.

vs. Why LLMs Fail at Causal Discovery and How Interventional Agents Escape

gemini-3.15/28/2026

Paper 2 establishes a fundamental theoretical limitation (kernel obstruction theorem) regarding LLMs' ability to perform causal discovery, proving that standard paradigms intrinsically fail regardless of scale or data. Fundamental impossibility theorems combined with mathematically grounded solutions (A-CBO) typically yield broader, longer-lasting impact across machine learning and scientific reasoning than specific algorithmic improvements like the knowledge editing method proposed in Paper 1.

vs. Senses Wide Shut: A Representation-Action Gap in Omnimodal LLMs

claude-opus-4.65/28/2026

Paper 2 identifies a fundamental and broadly impactful phenomenon—the Representation-Action Gap—in omnimodal LLMs, showing that models encode sensory conflicts internally but fail to act on them. This has wide implications for AI safety, multimodal reasoning, and alignment. The finding that the bottleneck is in translation rather than perception reframes how the field should approach grounding in multimodal models. Paper 1 addresses an important but narrower problem in knowledge editing. While both are rigorous, Paper 2's breadth of impact across multimodal AI, its novel diagnostic framework (IMAVB), and its relevance to rapidly proliferating omnimodal systems give it higher potential impact.

vs. PEAM: Parametric Embodied Agent Memory through Contrastive Internalization of Experience in Minecraft

gemini-3.15/28/2026

Paper 1 addresses a fundamental and pervasive issue in LLMs—epistemic dissonance during knowledge editing. By shifting the paradigm from static fact overwriting to causal editing, it offers a widely applicable solution to safely update foundation models without causing self-refutation. While Paper 2 presents an innovative continual learning framework for embodied agents, Paper 1's focus on the core mechanisms of LLM parametric memory provides broader theoretical implications and more immediate real-world utility across various NLP domains.

vs. RULER: Representation-Level Verification of Machine Unlearning

gemini-3.15/28/2026

Paper 1 exposes a fundamental flaw in how machine unlearning is currently evaluated and introduces a robust, representation-level verification framework. By setting a new, rigorous standard for unlearning verification, it has profound implications for AI privacy, security, and legal compliance (e.g., GDPR). While Paper 2 offers a valuable advancement in LLM knowledge editing, Paper 1's potential to redefine the evaluation paradigm of an entire subfield gives it higher broader scientific impact.

vs. PetroBench: A Benchmark for Large Language Models in Petroleum Engineering

gpt-5.25/28/2026

Paper 1 likely has higher scientific impact: it proposes a novel paradigm shift (static fact overwriting → causal knowledge evolution), introduces a concrete method (CODE) with strong empirical gains on major LLMs, and targets a broadly relevant problem in LLM reliability/maintenance with cross-domain implications (model editing, continual learning, alignment). Paper 2 is valuable and timely but primarily contributes a domain-specific benchmark for petroleum engineering; its impact is narrower and more application-bound, with less methodological novelty and broader-field influence than Paper 1.

vs. An LLM-Based Assistance System for Intuitive and Flexible Capability-Based Planning

claude-opus-4.65/28/2026

Paper 1 addresses a fundamental limitation in knowledge editing for LLMs—epistemic dissonance from static fact overwriting—and proposes a novel paradigm shift toward causal editing with strong empirical results (self-refutation reduced from 95.6% to 1.8%). It introduces a conceptually innovative framework (CODE) with broad implications for how LLMs internalize knowledge updates. Paper 2, while practical, presents an incremental engineering contribution combining existing LLM and SMT planning components for industrial automation, with limited evaluation scale (23 test cases) and narrower applicability. Paper 1's theoretical depth, novelty, and broader relevance to the LLM research community give it higher impact potential.

vs. LACUNA: Safe Agents as Recursive Program Holes

gemini-3.15/28/2026

Paper 2 offers higher scientific impact by addressing a fundamental structural flaw in Knowledge Editing—epistemic dissonance. While Paper 1 provides a valuable systems-level programming model for agent safety, Paper 2 proposes a paradigm shift in how foundational models internalize updates, moving from discrete fact overwriting to causal knowledge evolution. Its dramatic reduction of self-refutation rates (from 95.6% to 1.8%) demonstrates deep methodological rigor and solves a critical bottleneck in maintaining the factual accuracy and logical coherence of LLMs, ensuring broad applicability across all domains relying on dynamically updated models.

vs. From Knowing to Doing: A Memory-Controlled Benchmark for LLM Trading Agents on Stock Markets

gemini-3.15/28/2026

Paper 1 addresses a fundamental flaw in LLM knowledge editing (epistemic dissonance) by introducing a novel causal editing paradigm. Its impact spans across general AI development, lifelong learning, and model reliability. Paper 2, while methodologically rigorous in addressing data leakage, focuses on a domain-specific benchmarking problem (financial trading agents), limiting its breadth of impact compared to the foundational algorithmic contributions of Paper 1.

vs. A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks

gemini-3.15/28/2026

Paper 1 addresses a fundamental flaw in LLM knowledge editing (epistemic dissonance) by introducing a paradigm shift from static overwriting to causal evolution. This offers deep theoretical insights into parametric memory and provides a highly scalable solution for keeping LLMs updated without full retraining. While Paper 2 presents a valuable automated benchmark generation tool, Paper 1 has higher potential for fundamental scientific impact in LLM alignment and continuous learning.

vs. CoCoDA: Co-evolving Compositional DAG for Tool-Augmented Agents

claude-opus-4.65/28/2026

Paper 1 identifies a fundamental structural flaw in knowledge editing ('Epistemic Dissonance') with rigorous causal analysis, proposes a paradigm shift from static fact overwriting to causal editing, and delivers a novel method (CODE) with dramatic improvements (95.6% → 1.8% self-refutation). This addresses a core limitation in LLM updating with broad implications for the field. Paper 2 presents solid engineering contributions for tool-augmented agents but is more incremental, combining known ideas (DAGs, compositional tools, reward shaping) into a well-designed system. Paper 1's conceptual novelty and foundational impact on knowledge editing give it the edge.

vs. The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence

gemini-3.15/28/2026

Paper 2 introduces a frontier-level foundational MoE model with significant scale and highly efficient activation, tailored for real-world agentic tasks and self-evolution. Foundational models of this caliber typically generate widespread impact across multiple domains in AI research and industry applications. While Paper 1 offers a highly novel methodological advance in knowledge editing, Paper 2's comprehensive system design, RL framework, and broad utility give it higher potential for widespread scientific and practical impact.

vs. The Illusion of Opting in AI-Mediated Consequential Decisions

gemini-3.15/28/2026

Paper 2 offers a concrete, empirically validated solution to a critical technical bottleneck in current LLMs (Knowledge Editing). Its methodological rigor, demonstrated by significant quantitative improvements on state-of-the-art models and the release of open-source code, promises immediate real-world utility and broad impact in AI development. While Paper 1 provides a valuable philosophical framework for AI ethics, Paper 2's direct applicability and measurable results give it a higher potential for driving immediate scientific and technological advancement.

vs. Benchmarking AI for low-resource contexts: Thinking beyond leaderboards

gpt-5.25/28/2026

Paper 1 introduces a concrete, novel technical paradigm (Causal Editing) and an implemented method (CODE) addressing a clearly identified failure mode (self-refutation/epistemic dissonance) in LLM knowledge editing, with strong quantitative gains across models and potential broad downstream use in model updating, safety, and continual learning. Paper 2 is timely and valuable as a conceptual/framework contribution for evaluation in low-resource contexts, with wide policy relevance, but it appears less methodologically empirical and less likely to shift core model capabilities. Overall, Paper 1 has higher potential scientific impact.

vs. Picid: A Modular Evaluation Infrastructure for Reproducible PHM Across Tasks and Domains

gemini-3.15/28/2026

Paper 2 addresses a fundamental flaw in Knowledge Editing for Large Language Models, a highly active field with massive cross-disciplinary applications. Its novel causal distillation approach shows significant empirical improvements on modern LLMs. While Paper 1 provides a valuable standardization framework for Prognostics and Health Management (PHM), Paper 2's focus on LLM knowledge evolution offers broader, more timely, and more transformative potential scientific impact.

vs. Position: AI Safety Requires Effective Controllability

claude-opus-4.65/28/2026

Paper 1 presents a concrete technical contribution (CODE) addressing a well-defined problem (Epistemic Dissonance in knowledge editing) with strong empirical results showing dramatic improvements (self-refutation reduced from 95.6% to 1.8%, multi-hop accuracy up to 83.5%). It introduces a novel paradigm shift from static fact overwriting to causal editing with a reproducible method. Paper 2 is a position paper that, while raising important conceptual points about controllability vs. alignment, offers a preliminary benchmark and architectural framework without deep technical solutions. Paper 1's methodological rigor and immediately actionable contributions give it higher near-term scientific impact.

vs. MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

gpt-5.25/28/2026

Paper 2 targets a core limitation of knowledge editing—self-refutation/epistemic dissonance—argues it is structural, and introduces a paradigm shift to causal editing with an explicit method (CODE) validated on major open models (LLaMA-3.1, Qwen-2.5) with large reported gains (e.g., self-refutation down to 1.8%, strong multi-hop accuracy). This is timely and broadly relevant to model updating, safety, and reliability, with clear real-world applications. Paper 1 is useful for agent engineering, but is more incremental and narrower in cross-field impact.