RePAIR: Interactive Machine Unlearning through Prompt-Aware Model Repair

Jagadeesh Rachapudi, Pranav Singh, Ritali Vatsi, Praful Hambarde, Amit Shukla

#42 of 2292 · Artificial Intelligence
Share
Tournament Score
1571±26
10501800
76%
Win Rate
47
Wins
15
Losses
62
Matches
Rating
5.8/ 10
Significance
Rigor
Novelty
Clarity

Abstract

Large language models (LLMs) inherently absorb harmful knowledge, misinformation, and personal data during pretraining on large-scale web corpora, with no native mechanism for selective removal. While machine unlearning offers a principled solution, existing approaches are provider-centric, requiring retraining pipelines, curated retain datasets, and direct intervention by model service providers (MSPs), thereby excluding end users from controlling their own data. We introduce Interactive Machine Unlearning (IMU), a new paradigm in which users can instruct LLMs to forget targeted knowledge through natural language at inference time. To realize IMU, we propose RePAIR, a prompt-aware model repair framework comprising (i) a watchdog model for unlearning intent detection, (ii) a surgeon model for generating repair procedures, and (iii) a patient model whose parameters are updated autonomously. At the core of RePAIR, we develop Steering Through Activation Manipulation with PseudoInverse (STAMP), a training-free, single-sample unlearning method that redirects MLP activations toward a refusal subspace via closed-form pseudoinverse updates. Its low-rank variant reduces computational complexity from O(d^3) to O(r^3 + r^2 * d), enabling efficient on-device unlearning with up to ~3x speedup over training-based baselines. Extensive experiments across harmful knowledge suppression, misinformation correction, and personal data erasure demonstrate that RePAIR achieves near-zero forget scores (Acc_f = 0.00, F-RL = 0.00) while preserving model utility (Acc_r up to 84.47, R-RL up to 0.88), outperforming six state-of-the-art baselines. These results establish RePAIR as an effective and practical framework for user-driven model editing, advancing transparent and on-device control over learned knowledge, with potential extensions to multimodal foundation models.

AI Impact Assessments

(3 models)

Scientific Impact Assessment: RePAIR: Interactive Machine Unlearning through Prompt-Aware Model Repair

1. Core Contribution

The paper introduces Interactive Machine Unlearning (IMU), a paradigm shift from provider-centric to user-centric machine unlearning. Rather than requiring model service providers to run retraining pipelines, users can issue natural language instructions to remove targeted knowledge at inference time. The proposed framework, RePAIR, orchestrates three models: a watchdog (intent detection), a surgeon (code generation), and a patient (the model being repaired). The technical core is STAMP, a training-free, single-sample unlearning method that uses closed-form pseudoinverse updates to redirect MLP activations toward a refusal subspace. A low-rank variant (STAMP-LR) reduces computational complexity from O(d³) to O(r³ + r²·d).

The problem formulation is genuinely novel — no prior work frames unlearning as an interactive, user-driven process at inference time. The decomposition into "when/what/how to unlearn" is clean and well-motivated by privacy regulations (GDPR, CCPA).

2. Methodological Rigor

Strengths in methodology:

  • The STAMP method is mathematically well-grounded, using Moore-Penrose pseudoinverse for closed-form weight updates. The derivation from activation steering to pseudoinverse solutions is clearly presented.
  • The low-rank approximation (STAMP-LR) is a principled efficiency optimization with clear complexity analysis.
  • The layer selection strategy based on cosine divergence between forget and refusal activations (Figure 5) provides an empirically justified intervention point.
  • Concerns:

  • The experimental setup has notable limitations. The paper uses "reduced subsets" of WMDP and MMLU with free-form answer generation rather than standard MCQ evaluation, making results not directly comparable to published benchmarks. This undermines the ability to contextualize performance claims.
  • The retain buffer requirement (even at 10%) is somewhat contradictory to the claimed "training-free" nature. While no backpropagation occurs, the need for retain data at inference time is a significant practical limitation that partially undermines the user-centric narrative.
  • The evaluation of the full pipeline (Table 3) relies on LLM-as-judge evaluation (Mistral-7B evaluating outputs), which introduces assessment noise without clear calibration.
  • The single-sample experiment (Table 8) is compelling but somewhat artificial — all baselines are trained for only one epoch on a single sample, which is hardly a fair comparison since these methods were designed for batch settings. The result demonstrates STAMP's unique capability rather than exposing baseline weaknesses.
  • The synthetic biographical data (2K profiles from Mistral-7B API) for personal data erasure raises questions about ecological validity — real memorized personal data behaves differently from synthetically injected data.
  • 3. Potential Impact

    Practical applications:

  • The framework addresses a real governance gap: users currently have no mechanism to directly exercise their "right to be forgotten" with deployed LLMs.
  • On-device unlearning could be relevant for edge deployment scenarios where privacy-sensitive applications require local control.
  • The multi-model orchestration pattern (watchdog/surgeon/patient) is an interesting architectural contribution that could inspire similar agent-based model editing systems.
  • Broader influence:

  • If the IMU paradigm gains traction, it could influence how LLM providers design their APIs and deployment architectures.
  • The connection to test-time training literature opens a bridge between two previously separate research communities.
  • However, the security implications are concerning and insufficiently addressed: allowing users to arbitrarily modify model weights at inference time could be exploited for adversarial purposes (e.g., removing safety guardrails rather than personal data).
  • 4. Timeliness & Relevance

    The paper is highly timely. Machine unlearning for LLMs is an active area with growing regulatory pressure. The user-centric framing addresses a genuine gap — most existing methods assume provider-side access. The connection to GDPR/CCPA compliance makes this relevant beyond purely academic settings. The test-time adaptation angle is also currently popular, making this work well-positioned in the research zeitgeist.

    5. Strengths & Limitations

    Key Strengths:

  • Novel and well-motivated problem formulation (IMU)
  • Training-free operation with closed-form solutions is genuinely differentiating
  • Single-sample capability is unique among compared methods
  • Comprehensive evaluation across three distinct tasks
  • Clear computational complexity analysis with practical speedup demonstrations
  • The ~3× speedup over training-based baselines is meaningful for practical deployment
  • Notable Weaknesses:

  • Security model is absent: No discussion of adversarial users who might exploit IMU to remove safety training, inject backdoors, or degrade model capabilities. This is a critical omission for a user-facing unlearning system.
  • Scalability questions: The framework requires three separate models running simultaneously (watchdog, surgeon, patient), which is resource-intensive and somewhat contradicts the "on-device" narrative.
  • Evaluation depth: Non-standard benchmark configurations limit comparability. The paper would benefit from evaluation on standard WMDP MCQ settings alongside the free-form setting.
  • Permanence and compositionality: No analysis of sequential unlearning requests — what happens when multiple unlearning operations accumulate? Does the pseudoinverse update compound errors?
  • Refusal quality: The paper conflates "forgetting" with "refusing to answer." True unlearning should make the model behave as if it never learned the information, not simply trigger refusal responses, which could be distinguished by sophisticated probing.
  • Limited model scale: Experiments only on 7B-8B models; scalability to larger models is unexplored.
  • The retain buffer requirement at inference time is a practical constraint that limits the user-driven narrative.
  • Additional Observations

    The paper's venue (ACM MM '26) and the future date (2026) are noteworthy. The writing quality is generally good but could benefit from tighter organization in the experimental section. Some claims in the abstract ("near-zero forget scores") are well-supported, while others ("potential extensions to multimodal foundation models") remain speculative. The comparison against only six baselines, while reasonable, misses some relevant recent work in model editing (e.g., ROME, MEMIT) that could serve as additional comparisons for the activation-manipulation approach.

    Rating:5.8/ 10
    Significance 6.5Rigor 5Novelty 7Clarity 6.5

    Generated Apr 15, 2026

    Comparison History (62)

    vs. CoDaS: AI Co-Data-Scientist for Biomarker Discovery via Wearable Sensors
    gpt-5.25/5/2026

    Paper 2 is likely higher impact due to strong novelty (interactive, user-driven unlearning at inference time; training-free single-sample STAMP with closed-form updates), high timeliness (privacy, safety, regulation for LLMs), and broad applicability across ML/security/HCI/policy. It proposes a general framework (watchdog/surgeon/patient) and reports strong quantitative results against multiple baselines, suggesting methodological rigor and portability (on-device efficiency). Paper 1 is valuable for digital health biomarker workflows, but its impact is more domain-specific and shows modest predictive gains, with more limited cross-field reach.

    vs. SAVE: A Generalizable Framework for Multi-Condition Single-Cell Generation with Gene Block Attention
    claude-opus-4.65/5/2026

    SAVE addresses a fundamental challenge in computational biology—modeling single-cell gene expression across diverse conditions—with a novel gene block attention mechanism that captures higher-order biological relationships. Its applications span batch correction, perturbation prediction, and virtual cell synthesis, offering broad utility across biology and medicine. While RePAIR presents an interesting interactive unlearning paradigm for LLMs, SAVE's contributions to single-cell genomics have wider scientific impact potential given the rapidly growing importance of single-cell technologies in biomedical research, drug discovery, and precision medicine. The biological grounding and generalizable framework design give SAVE stronger cross-disciplinary reach.

    vs. CoDaS: AI Co-Data-Scientist for Biomarker Discovery via Wearable Sensors
    gpt-5.25/5/2026

    Paper 2 has higher likely impact due to strong novelty (user-driven, inference-time unlearning) and broad, timely relevance to safety, privacy, and regulation of widely deployed LLMs. The STAMP closed-form, training-free update with a low-rank acceleration is methodologically distinctive and practical (on-device), enabling real-world adoption across many applications and model providers. Its scope generalizes beyond one domain and could influence model editing, security, and compliance research. Paper 1 is valuable for digital health, but its impact is narrower, improvements are modest, and results are more application-specific.

    vs. SAVE: A Generalizable Framework for Multi-Condition Single-Cell Generation with Gene Block Attention
    claude-opus-4.65/5/2026

    SAVE addresses a fundamental challenge in single-cell genomics—modeling gene expression across diverse conditions—with a novel architecture combining gene block attention, flow matching, and condition masking. Its broad applicability across conditional generation, batch correction, and perturbation prediction in biology gives it high cross-disciplinary impact. While RePAIR introduces an interesting interactive unlearning paradigm, its scope is narrower (LLM safety/privacy) and builds incrementally on existing unlearning methods. SAVE's potential to enable virtual cell synthesis and biological discovery represents a more transformative scientific contribution.

    vs. The Long-Horizon Task Mirage? Diagnosing Where and Why Agentic Systems Break
    claude-opus-4.65/5/2026

    Paper 1 introduces a novel paradigm (Interactive Machine Unlearning) with a concrete, technically rigorous framework (RePAIR/STAMP) that addresses a critical and timely problem—user-driven selective knowledge removal from LLMs. It offers a training-free, closed-form method with strong theoretical grounding and demonstrates clear empirical superiority over six baselines. Paper 2 provides a useful diagnostic benchmark for long-horizon agent failures, but is more incremental—primarily an evaluation framework rather than a solution. Paper 1's broader applicability (privacy, safety, misinformation), methodological novelty, and practical deployment potential give it higher impact.

    vs. MIRROR: A Hierarchical Benchmark for Metacognitive Calibration in Large Language Models
    claude-opus-4.65/5/2026

    RePAIR introduces a novel paradigm (Interactive Machine Unlearning) with a concrete technical contribution (STAMP) that enables user-driven, training-free knowledge removal from LLMs. It addresses critical practical needs (privacy, harmful content removal) with strong empirical results and clear computational advantages. While MIRROR provides valuable benchmarking insights about metacognitive calibration, its contributions are primarily diagnostic rather than constructive. RePAIR's actionable framework with immediate real-world applications in privacy compliance, safety, and on-device deployment gives it broader and more transformative impact potential across multiple research communities.

    vs. Long-Horizon Plan Execution in Large Tool Spaces through Entropy-Guided Branching
    gpt-5.25/5/2026

    Paper 2 is likely higher impact due to stronger novelty and broader real-world relevance: user-driven, inference-time machine unlearning addresses urgent privacy/safety/regulatory needs and could apply across domains and modalities. The STAMP closed-form, training-free update is a distinctive technical contribution with practical on-device feasibility. Methodologically, it reports extensive evaluations across multiple unlearning settings with utility preservation and comparisons to several baselines. Paper 1 is valuable (benchmark + search improvement for tool use), but its impact is narrower to agent planning in tool libraries and less societally time-critical than unlearning/control.

    vs. Dr. RTL: Autonomous Agentic RTL Optimization through Tool-Grounded Self-Improvement
    gemini-35/5/2026

    Paper 2 addresses machine unlearning, a critical bottleneck in LLM safety, privacy, and compliance. Its user-driven, training-free approach at inference time significantly advances AI alignment with broad applicability across AI domains. While Paper 1 offers strong innovations in hardware design automation (EDA), Paper 2's focus on foundational AI safety, misinformation, and data erasure provides a wider, more timely scientific impact that affects both end-users and model providers globally.

    vs. Emergence Transformer: Dynamical Temporal Attention Matters
    gpt-5.25/5/2026

    Paper 1 targets a timely, high-stakes problem (LLM unlearning for safety/privacy) with a concrete, user-facing paradigm and an efficient, training-free method (closed-form pseudoinverse, low-rank speedups) plus broad empirical evaluation against multiple baselines. Its real-world applicability (on-device, end-user control) and likely adoption in ML safety/model editing give strong near-term and cross-industry impact. Paper 2 is conceptually interesting for dynamical systems and attention, but appears less grounded in standard ML benchmarks/deployments, with impact more speculative and potentially narrower until validated at scale.

    vs. Introspection Adapters: Training LLMs to Report Their Learned Behaviors
    claude-opus-4.65/5/2026

    Paper 1 introduces a more novel and broadly applicable concept—training adapters that enable LLMs to introspect and report their own learned behaviors. This addresses the critical AI safety problem of auditing fine-tuned models at scale, with a creative methodology (joint training across finetunes) that generalizes to unseen models. Paper 2's interactive unlearning framework, while practical, builds more incrementally on existing unlearning literature. Paper 1's approach has broader implications for AI governance, safety auditing, and transparency, and its generalization results suggest deeper scientific insights about model representations.

    vs. Introspection Adapters: Training LLMs to Report Their Learned Behaviors
    gemini-35/5/2026

    Paper 2 introduces a paradigm shift from provider-centric to user-driven machine unlearning via natural language prompts. Its methodological contribution, the training-free STAMP algorithm, enables efficient, on-device unlearning with a closed-form solution, vastly reducing computational overhead. While Paper 1 offers a highly novel approach to auditing LLMs for hidden behaviors, Paper 2's democratization of model editing addresses pressing, wide-reaching issues like misinformation, privacy, and personal data erasure directly at the user level, giving it broader potential real-world applications and higher overall scientific impact.

    vs. Emergence Transformer: Dynamical Temporal Attention Matters
    claude-opus-4.65/5/2026

    Paper 1 introduces a fundamentally novel conceptual bridge between Transformer attention mechanisms and emergence phenomena in complex systems, with broad interdisciplinary impact spanning physics, neuroscience, climate science, and social dynamics. Its theoretical contribution—dynamical temporal attention enabling emergent coherence modulation—opens new research directions across multiple fields. The demonstration of continual learning without catastrophic forgetting via Hopfield networks adds practical AI value. Paper 2, while technically strong and practically useful for LLM unlearning, addresses a more incremental and narrower problem with less cross-disciplinary reach.

    vs. Self-Correction as Feedback Control: Error Dynamics, Stability Thresholds, and Prompt Interventions in LLMs
    gpt-5.25/5/2026

    Paper 2 likely has higher impact: it introduces a new user-facing paradigm (interactive unlearning at inference time) plus a concrete, efficient method (STAMP with low-rank acceleration) that enables practical on-device model editing—highly timely given privacy, safety, and regulatory pressures. The approach has broad applications (data erasure, misinformation, harmful content) and crosses security/privacy/ML systems. Paper 1 offers a valuable theoretical framing and diagnostic for self-correction, but its primary contribution is analysis and prompting guidance with narrower downstream leverage compared to scalable unlearning capabilities.

    vs. When to Forget: A Memory Governance Primitive
    claude-opus-4.65/5/2026

    Paper 1 introduces a novel paradigm (Interactive Machine Unlearning) with a comprehensive framework (RePAIR/STAMP) that addresses a critical and timely problem—user-driven selective knowledge removal from LLMs. It demonstrates strong empirical results across multiple tasks, outperforming six baselines, and offers practical computational efficiency for on-device deployment. Paper 2 proposes a useful but narrower contribution (Memory Worth metric) for agent memory governance, validated primarily in synthetic settings. Paper 1 has broader impact across safety, privacy, and model editing, with stronger methodological depth and more immediate real-world applicability to the rapidly growing LLM ecosystem.

    vs. Spatial Atlas: Compute-Grounded Reasoning for Spatial-Aware Research Agent Benchmarks
    gemini-35/5/2026

    Paper 2 addresses the critical and highly timely challenges of AI privacy, safety, and copyright through machine unlearning. By introducing a user-driven, training-free unlearning method at inference time, it democratizes model editing and offers a scalable solution to regulatory compliance (e.g., GDPR). This gives it broader real-world applicability and foundational importance across the AI community compared to Paper 1, which focuses on a narrower, albeit interesting, domain of spatial reasoning and agent benchmarks.

    vs. Human-Inspired Context-Selective Multimodal Memory for Social Robots
    claude-opus-4.65/5/2026

    RePAIR introduces a novel paradigm (Interactive Machine Unlearning) that addresses a critical and timely problem—user-driven selective knowledge removal from LLMs. Its contributions span a new problem formulation, a training-free unlearning method (STAMP) with strong theoretical grounding, and practical efficiency gains enabling on-device deployment. The work has broader impact across AI safety, privacy regulation compliance, and model editing, with extensibility to multimodal models. Paper 1, while solid in its domain of social robotics memory, addresses a narrower application area with more incremental contributions combining existing techniques.

    vs. Auditing Sabotage Bench: A Benchmark for Detecting and Fixing Research Sabotage in ML Codebases
    gemini-35/5/2026

    Paper 1 addresses a critical and widespread challenge in LLM deployment: unlearning harmful or private data. Its introduction of a training-free, user-driven interactive unlearning method provides significant algorithmic innovation and broad real-world applicability across privacy and AI safety. Paper 2, while timely for AI alignment, focuses on a niche benchmark for autonomous research sabotage, limiting its immediate breadth of impact compared to the core capability advancements in Paper 1.

    vs. The World Leaks the Future: Harness Evolution for Future Prediction Agents
    gpt-5.25/5/2026

    Paper 1 is more novel and broadly impactful: it introduces an end-user, inference-time unlearning paradigm with a concrete, training-free parameter update method (STAMP) and efficiency improvements enabling on-device control—highly timely for privacy, safety, and regulatory compliance. Its methodological contribution (closed-form pseudoinverse/low-rank updates) is likely reusable across model editing and alignment, with clear real-world applications and strong empirical evaluation against multiple baselines. Paper 2 is promising for forecasting agents but is narrower in application scope and seems more like system/prompt-harness engineering with less foundational methodological innovation.

    vs. The Query Channel: Information-Theoretic Limits of Masking-Based Explanations
    claude-opus-4.65/5/2026

    Paper 2 introduces a novel paradigm (Interactive Machine Unlearning) that addresses urgent practical needs around data privacy, harmful content removal, and user control over LLMs. Its training-free STAMP method with closed-form updates is technically innovative and practically deployable. The work has broader real-world impact across privacy regulation compliance, safety, and misinformation control, with strong empirical results. Paper 1, while theoretically elegant in connecting explainability to information theory, addresses a more niche theoretical question with narrower immediate applicability. Paper 2's timeliness regarding LLM safety and privacy concerns gives it higher potential impact.

    vs. Polysemantic Experts, Monosemantic Paths: Routing as Control in MoEs
    claude-opus-4.64/21/2026

    Paper 1 introduces a novel, fundamental insight into how MoE models organize computation—separating control signals from content channels—revealing that expert paths (not individual experts) are the natural unit of interpretability. This parameter-free decomposition validated across six architectures provides deep mechanistic understanding with broad implications for interpretability research. Paper 2, while practically useful, addresses machine unlearning with an incremental engineering contribution (combining known techniques like activation manipulation with pseudoinverse updates). Paper 1's conceptual contribution is more likely to reshape how the field thinks about MoE architectures and model interpretability.