RePAIR: Interactive Machine Unlearning through Prompt-Aware Model Repair
Jagadeesh Rachapudi, Pranav Singh, Ritali Vatsi, Praful Hambarde, Amit Shukla
Abstract
Large language models (LLMs) inherently absorb harmful knowledge, misinformation, and personal data during pretraining on large-scale web corpora, with no native mechanism for selective removal. While machine unlearning offers a principled solution, existing approaches are provider-centric, requiring retraining pipelines, curated retain datasets, and direct intervention by model service providers (MSPs), thereby excluding end users from controlling their own data. We introduce Interactive Machine Unlearning (IMU), a new paradigm in which users can instruct LLMs to forget targeted knowledge through natural language at inference time. To realize IMU, we propose RePAIR, a prompt-aware model repair framework comprising (i) a watchdog model for unlearning intent detection, (ii) a surgeon model for generating repair procedures, and (iii) a patient model whose parameters are updated autonomously. At the core of RePAIR, we develop Steering Through Activation Manipulation with PseudoInverse (STAMP), a training-free, single-sample unlearning method that redirects MLP activations toward a refusal subspace via closed-form pseudoinverse updates. Its low-rank variant reduces computational complexity from O(d^3) to O(r^3 + r^2 * d), enabling efficient on-device unlearning with up to ~3x speedup over training-based baselines. Extensive experiments across harmful knowledge suppression, misinformation correction, and personal data erasure demonstrate that RePAIR achieves near-zero forget scores (Acc_f = 0.00, F-RL = 0.00) while preserving model utility (Acc_r up to 84.47, R-RL up to 0.88), outperforming six state-of-the-art baselines. These results establish RePAIR as an effective and practical framework for user-driven model editing, advancing transparent and on-device control over learned knowledge, with potential extensions to multimodal foundation models.
AI Impact Assessments
(3 models)Scientific Impact Assessment: RePAIR: Interactive Machine Unlearning through Prompt-Aware Model Repair
1. Core Contribution
The paper introduces Interactive Machine Unlearning (IMU), a paradigm shift from provider-centric to user-centric machine unlearning. Rather than requiring model service providers to run retraining pipelines, users can issue natural language instructions to remove targeted knowledge at inference time. The proposed framework, RePAIR, orchestrates three models: a watchdog (intent detection), a surgeon (code generation), and a patient (the model being repaired). The technical core is STAMP, a training-free, single-sample unlearning method that uses closed-form pseudoinverse updates to redirect MLP activations toward a refusal subspace. A low-rank variant (STAMP-LR) reduces computational complexity from O(d³) to O(r³ + r²·d).
The problem formulation is genuinely novel — no prior work frames unlearning as an interactive, user-driven process at inference time. The decomposition into "when/what/how to unlearn" is clean and well-motivated by privacy regulations (GDPR, CCPA).
2. Methodological Rigor
Strengths in methodology:
Concerns:
3. Potential Impact
Practical applications:
Broader influence:
4. Timeliness & Relevance
The paper is highly timely. Machine unlearning for LLMs is an active area with growing regulatory pressure. The user-centric framing addresses a genuine gap — most existing methods assume provider-side access. The connection to GDPR/CCPA compliance makes this relevant beyond purely academic settings. The test-time adaptation angle is also currently popular, making this work well-positioned in the research zeitgeist.
5. Strengths & Limitations
Key Strengths:
Notable Weaknesses:
Additional Observations
The paper's venue (ACM MM '26) and the future date (2026) are noteworthy. The writing quality is generally good but could benefit from tighter organization in the experimental section. Some claims in the abstract ("near-zero forget scores") are well-supported, while others ("potential extensions to multimodal foundation models") remain speculative. The comparison against only six baselines, while reasonable, misses some relevant recent work in model editing (e.g., ROME, MEMIT) that could serve as additional comparisons for the activation-manipulation approach.
Generated Apr 15, 2026
Comparison History (62)
Paper 2 is likely higher impact due to strong novelty (interactive, user-driven unlearning at inference time; training-free single-sample STAMP with closed-form updates), high timeliness (privacy, safety, regulation for LLMs), and broad applicability across ML/security/HCI/policy. It proposes a general framework (watchdog/surgeon/patient) and reports strong quantitative results against multiple baselines, suggesting methodological rigor and portability (on-device efficiency). Paper 1 is valuable for digital health biomarker workflows, but its impact is more domain-specific and shows modest predictive gains, with more limited cross-field reach.
SAVE addresses a fundamental challenge in computational biology—modeling single-cell gene expression across diverse conditions—with a novel gene block attention mechanism that captures higher-order biological relationships. Its applications span batch correction, perturbation prediction, and virtual cell synthesis, offering broad utility across biology and medicine. While RePAIR presents an interesting interactive unlearning paradigm for LLMs, SAVE's contributions to single-cell genomics have wider scientific impact potential given the rapidly growing importance of single-cell technologies in biomedical research, drug discovery, and precision medicine. The biological grounding and generalizable framework design give SAVE stronger cross-disciplinary reach.
Paper 2 has higher likely impact due to strong novelty (user-driven, inference-time unlearning) and broad, timely relevance to safety, privacy, and regulation of widely deployed LLMs. The STAMP closed-form, training-free update with a low-rank acceleration is methodologically distinctive and practical (on-device), enabling real-world adoption across many applications and model providers. Its scope generalizes beyond one domain and could influence model editing, security, and compliance research. Paper 1 is valuable for digital health, but its impact is narrower, improvements are modest, and results are more application-specific.
SAVE addresses a fundamental challenge in single-cell genomics—modeling gene expression across diverse conditions—with a novel architecture combining gene block attention, flow matching, and condition masking. Its broad applicability across conditional generation, batch correction, and perturbation prediction in biology gives it high cross-disciplinary impact. While RePAIR introduces an interesting interactive unlearning paradigm, its scope is narrower (LLM safety/privacy) and builds incrementally on existing unlearning methods. SAVE's potential to enable virtual cell synthesis and biological discovery represents a more transformative scientific contribution.
Paper 1 introduces a novel paradigm (Interactive Machine Unlearning) with a concrete, technically rigorous framework (RePAIR/STAMP) that addresses a critical and timely problem—user-driven selective knowledge removal from LLMs. It offers a training-free, closed-form method with strong theoretical grounding and demonstrates clear empirical superiority over six baselines. Paper 2 provides a useful diagnostic benchmark for long-horizon agent failures, but is more incremental—primarily an evaluation framework rather than a solution. Paper 1's broader applicability (privacy, safety, misinformation), methodological novelty, and practical deployment potential give it higher impact.
RePAIR introduces a novel paradigm (Interactive Machine Unlearning) with a concrete technical contribution (STAMP) that enables user-driven, training-free knowledge removal from LLMs. It addresses critical practical needs (privacy, harmful content removal) with strong empirical results and clear computational advantages. While MIRROR provides valuable benchmarking insights about metacognitive calibration, its contributions are primarily diagnostic rather than constructive. RePAIR's actionable framework with immediate real-world applications in privacy compliance, safety, and on-device deployment gives it broader and more transformative impact potential across multiple research communities.
Paper 2 is likely higher impact due to stronger novelty and broader real-world relevance: user-driven, inference-time machine unlearning addresses urgent privacy/safety/regulatory needs and could apply across domains and modalities. The STAMP closed-form, training-free update is a distinctive technical contribution with practical on-device feasibility. Methodologically, it reports extensive evaluations across multiple unlearning settings with utility preservation and comparisons to several baselines. Paper 1 is valuable (benchmark + search improvement for tool use), but its impact is narrower to agent planning in tool libraries and less societally time-critical than unlearning/control.
Paper 2 addresses machine unlearning, a critical bottleneck in LLM safety, privacy, and compliance. Its user-driven, training-free approach at inference time significantly advances AI alignment with broad applicability across AI domains. While Paper 1 offers strong innovations in hardware design automation (EDA), Paper 2's focus on foundational AI safety, misinformation, and data erasure provides a wider, more timely scientific impact that affects both end-users and model providers globally.
Paper 1 targets a timely, high-stakes problem (LLM unlearning for safety/privacy) with a concrete, user-facing paradigm and an efficient, training-free method (closed-form pseudoinverse, low-rank speedups) plus broad empirical evaluation against multiple baselines. Its real-world applicability (on-device, end-user control) and likely adoption in ML safety/model editing give strong near-term and cross-industry impact. Paper 2 is conceptually interesting for dynamical systems and attention, but appears less grounded in standard ML benchmarks/deployments, with impact more speculative and potentially narrower until validated at scale.
Paper 1 introduces a more novel and broadly applicable concept—training adapters that enable LLMs to introspect and report their own learned behaviors. This addresses the critical AI safety problem of auditing fine-tuned models at scale, with a creative methodology (joint training across finetunes) that generalizes to unseen models. Paper 2's interactive unlearning framework, while practical, builds more incrementally on existing unlearning literature. Paper 1's approach has broader implications for AI governance, safety auditing, and transparency, and its generalization results suggest deeper scientific insights about model representations.
Paper 2 introduces a paradigm shift from provider-centric to user-driven machine unlearning via natural language prompts. Its methodological contribution, the training-free STAMP algorithm, enables efficient, on-device unlearning with a closed-form solution, vastly reducing computational overhead. While Paper 1 offers a highly novel approach to auditing LLMs for hidden behaviors, Paper 2's democratization of model editing addresses pressing, wide-reaching issues like misinformation, privacy, and personal data erasure directly at the user level, giving it broader potential real-world applications and higher overall scientific impact.
Paper 1 introduces a fundamentally novel conceptual bridge between Transformer attention mechanisms and emergence phenomena in complex systems, with broad interdisciplinary impact spanning physics, neuroscience, climate science, and social dynamics. Its theoretical contribution—dynamical temporal attention enabling emergent coherence modulation—opens new research directions across multiple fields. The demonstration of continual learning without catastrophic forgetting via Hopfield networks adds practical AI value. Paper 2, while technically strong and practically useful for LLM unlearning, addresses a more incremental and narrower problem with less cross-disciplinary reach.
Paper 2 likely has higher impact: it introduces a new user-facing paradigm (interactive unlearning at inference time) plus a concrete, efficient method (STAMP with low-rank acceleration) that enables practical on-device model editing—highly timely given privacy, safety, and regulatory pressures. The approach has broad applications (data erasure, misinformation, harmful content) and crosses security/privacy/ML systems. Paper 1 offers a valuable theoretical framing and diagnostic for self-correction, but its primary contribution is analysis and prompting guidance with narrower downstream leverage compared to scalable unlearning capabilities.
Paper 1 introduces a novel paradigm (Interactive Machine Unlearning) with a comprehensive framework (RePAIR/STAMP) that addresses a critical and timely problem—user-driven selective knowledge removal from LLMs. It demonstrates strong empirical results across multiple tasks, outperforming six baselines, and offers practical computational efficiency for on-device deployment. Paper 2 proposes a useful but narrower contribution (Memory Worth metric) for agent memory governance, validated primarily in synthetic settings. Paper 1 has broader impact across safety, privacy, and model editing, with stronger methodological depth and more immediate real-world applicability to the rapidly growing LLM ecosystem.
Paper 2 addresses the critical and highly timely challenges of AI privacy, safety, and copyright through machine unlearning. By introducing a user-driven, training-free unlearning method at inference time, it democratizes model editing and offers a scalable solution to regulatory compliance (e.g., GDPR). This gives it broader real-world applicability and foundational importance across the AI community compared to Paper 1, which focuses on a narrower, albeit interesting, domain of spatial reasoning and agent benchmarks.
RePAIR introduces a novel paradigm (Interactive Machine Unlearning) that addresses a critical and timely problem—user-driven selective knowledge removal from LLMs. Its contributions span a new problem formulation, a training-free unlearning method (STAMP) with strong theoretical grounding, and practical efficiency gains enabling on-device deployment. The work has broader impact across AI safety, privacy regulation compliance, and model editing, with extensibility to multimodal models. Paper 1, while solid in its domain of social robotics memory, addresses a narrower application area with more incremental contributions combining existing techniques.
Paper 1 addresses a critical and widespread challenge in LLM deployment: unlearning harmful or private data. Its introduction of a training-free, user-driven interactive unlearning method provides significant algorithmic innovation and broad real-world applicability across privacy and AI safety. Paper 2, while timely for AI alignment, focuses on a niche benchmark for autonomous research sabotage, limiting its immediate breadth of impact compared to the core capability advancements in Paper 1.
Paper 1 is more novel and broadly impactful: it introduces an end-user, inference-time unlearning paradigm with a concrete, training-free parameter update method (STAMP) and efficiency improvements enabling on-device control—highly timely for privacy, safety, and regulatory compliance. Its methodological contribution (closed-form pseudoinverse/low-rank updates) is likely reusable across model editing and alignment, with clear real-world applications and strong empirical evaluation against multiple baselines. Paper 2 is promising for forecasting agents but is narrower in application scope and seems more like system/prompt-harness engineering with less foundational methodological innovation.
Paper 2 introduces a novel paradigm (Interactive Machine Unlearning) that addresses urgent practical needs around data privacy, harmful content removal, and user control over LLMs. Its training-free STAMP method with closed-form updates is technically innovative and practically deployable. The work has broader real-world impact across privacy regulation compliance, safety, and misinformation control, with strong empirical results. Paper 1, while theoretically elegant in connecting explainability to information theory, addresses a more niche theoretical question with narrower immediate applicability. Paper 2's timeliness regarding LLM safety and privacy concerns gives it higher potential impact.
Paper 1 introduces a novel, fundamental insight into how MoE models organize computation—separating control signals from content channels—revealing that expert paths (not individual experts) are the natural unit of interpretability. This parameter-free decomposition validated across six architectures provides deep mechanistic understanding with broad implications for interpretability research. Paper 2, while practically useful, addresses machine unlearning with an incremental engineering contribution (combining known techniques like activation manipulation with pseudoinverse updates). Paper 1's conceptual contribution is more likely to reshape how the field thinks about MoE architectures and model interpretability.