Bocheng Ju, Jianhua Wang, Chengliang Liu, Xiaolin Chang
Large language model unlearning aims to suppress designated undesirable knowledge while preserving benign capabilities. Many unlearning objectives focus on suppressing undesired answers, while recent target-guided variants specify replacement behavior but still leave update locality largely unconstrained. This paper introduces \emph{Null-Space Constrained Response-Specified Unlearning} (NSRU), a projection-constrained low-rank framework for controlled LLM unlearning. NSRU uses an explicitly structured safe target response to specify the desired behavior for each forget query, while suppressing the original undesired content. To localize adaptation, NSRU estimates per-module retain subspaces from benign hidden representations and uses an orthogonal-projected low-rank parameterization to confine LoRA updates to the null space of the retain subspace. The resulting objective jointly optimizes safe-target learning, undesired-response suppression, and retention preservation under this constrained parameterization. We provide a local first-order analysis showing that the projected update reduces retain-side perturbations while preserving editable directions for shaping forget-query behavior. Experiments on TOFU show that NSRU effectively suppresses extractable forget-set knowledge while improving retain QA performance, model utility, and safe-target alignment over representative baselines. On WMDP, NSRU keeps hazardous-domain accuracy near the random-choice region while preserving broad and domain-adjacent MMLU utility. Ablation studies support the complementary roles of safe-target supervision, undesired-response suppression, retention loss, and null-space projected updates, while sensitivity and robustness analyses indicate stable behavior across the tested hyperparameter and prompt variations.
NSRU proposes a unified framework for LLM unlearning that couples three previously separate ideas: (1) response-specified unlearning with explicit safe target responses, (2) suppression of the original undesired response, and (3) null-space projected LoRA updates that confine parameter changes to directions orthogonal to an estimated retain subspace. The key insight is that unlearning should be treated as a *constrained adaptation* problem — specifying both *what* the model should output post-unlearning and *where* in parameter space updates are permitted.
The method estimates per-module retain subspaces via SVD of benign hidden representations, constructs orthogonal projectors, and applies LoRA updates only through the null-space component of module inputs. This is a clean formulation that naturally prevents retain-side interference at the representation level rather than relying solely on loss-based regularization.
The paper addresses a genuinely important problem: how to make LLM unlearning both *controllable* (specifying replacement behavior) and *localized* (avoiding collateral damage to retained capabilities). The retain-side improvements are substantial — on TOFU-Forget05, R-ROUGE jumps from 0.4651 (TRU) to 0.9626, and MU from 0.4845 to 0.6863. On WMDP, NSRU is the only method that suppresses hazardous accuracy near random while preserving MMLU close to the base model (0.5652 vs. 0.5772 on Bio).
Real-world relevance: The framework is directly applicable to compliance-driven unlearning (GDPR right-to-be-forgotten), safety alignment for hazardous knowledge, and content moderation. The ability to specify safe replacement responses rather than just suppress answers is practically important for deployment.
Broader influence: The null-space projection approach connects to continual learning, model editing, and parameter-efficient fine-tuning. The OPLoRA-style projection adapted for unlearning could inspire similar constrained adaptation approaches in other editing/alignment tasks.
LLM unlearning is a timely topic given regulatory pressures (GDPR, AI Act) and safety concerns around hazardous knowledge in open-weight models. The paper directly addresses two current bottlenecks: (1) most unlearning methods cause significant utility degradation, and (2) post-unlearning behavior is often uncontrolled. The work builds on very recent references (2024-2026), positioning it well within the rapidly evolving literature.
The ablation study (Table IV) is particularly informative: removing undesired-response suppression causes ES to spike to 0.8304 while ST-ROUGE remains high at 0.8719, demonstrating that surface-level safe-response generation can coexist with fully extractable original knowledge. This finding alone is a useful contribution to understanding unlearning evaluation.
The paper is well-written with clear notation (Table I is helpful) and a logical flow from problem formulation to method to experiments. Reproducibility appears reasonable given the detailed hyperparameter reporting and use of public benchmarks.
Generated Jun 10, 2026
Paper 2 targets timely, high-impact challenges in LLM safety/compliance (unlearning with minimal capability loss) with clear real-world applicability and broad cross-field relevance. Its method (null-space constrained LoRA updates guided by safe target responses) is a concrete, scalable mechanism likely to be adopted in practice, supported by analyses, ablations, and evaluations on prominent benchmarks (TOFU, WMDP, MMLU). Paper 1 is novel and rigorous but more specialized (adversarial robustness in submodular summarization) with narrower immediate deployment impact and community reach.
Paper 1 presents a technically novel method (NSRU) addressing the important and timely problem of LLM unlearning with a principled null-space projection approach, supported by theoretical analysis and comprehensive experiments on established benchmarks. It has broader applicability across the rapidly growing LLM safety field. Paper 2, while relevant to AI regulation, addresses a narrower legal/regulatory interpretation question specific to the EU AI Act's definition of inference, with limited technical novelty and a smaller potential audience primarily in AI policy rather than broader scientific research.
Paper 1 addresses a critical bottleneck in deploying autonomous LLM agents: managing long-horizon memory efficiently. Its introduction of a hierarchical, file-system-like memory structure significantly reduces token usage and latency while preserving reasoning quality. This architectural innovation has broad, immediate real-world applications across numerous domains involving autonomous agents, giving it a higher potential for widespread impact and adoption compared to the more specialized, safety-focused unlearning methodology of Paper 2.
Paper 1 has higher potential impact due to a more novel, methodologically grounded contribution: a principled null-space constrained LoRA mechanism for response-specified unlearning with theoretical first-order analysis and evaluation on key unlearning/safety benchmarks (TOFU, WMDP) tied to urgent deployment concerns. Its approach is broadly relevant across safety, privacy, model editing, and continual learning. Paper 2 is timely and practically useful for agentic web search, but the core ideas (tree search with UCB-like selection and memory) are closer to established planning/bandit/control heuristics and likely more incremental, with narrower cross-field implications.
Paper 2 introduces a much-needed benchmark for long-horizon, professional GUI tasks, addressing a critical bottleneck in the rapidly expanding field of autonomous AI agents. By revealing significant limitations in state-of-the-art models, it provides a foundational evaluation framework that will likely drive broad future research. Paper 1 offers a valuable but highly specific methodological refinement for LLM unlearning, which, while important for safety, has a narrower scope compared to the foundational impact of a novel, challenging benchmark in agentic AI.
Paper 2 is likely to have higher scientific impact because it standardizes evaluation across tabular representation-learning paradigms with a comprehensive benchmark, datasets, and protocol that can be broadly reused by the community. This supports methodological rigor, reproducibility, and wide applicability across ML, data management, and industry tabular problems, making it a durable reference point. Paper 1 is innovative and timely for LLM safety/unlearning, but its impact may be narrower (specific to unlearning/LoRA settings) and more sensitive to shifting unlearning benchmarks and threat models compared to a widely adopted evaluation standard.
Paper 1 addresses LLM unlearning, a timely and broadly impactful problem in AI safety with significant real-world applications (privacy, harmful knowledge removal). It presents a principled method (NSRU) with strong theoretical grounding and comprehensive experiments on established benchmarks (TOFU, WMDP). Paper 2 tackles a niche problem in spatial memory for language agents, with contributions that are self-admittedly 'near-tautological' and remain at the pilot stage with confirmatory studies left as future work. Paper 1's broader relevance to AI safety, methodological rigor, and completeness give it substantially higher impact potential.
Paper 2 addresses a fundamental challenge in multimodal AI by eliminating the need for paired data, significantly broadening the applicability of cross-modal distillation. Its strong theoretical foundation and applicability across various modalities suggest a wider impact compared to Paper 1, which focuses on a specific, albeit timely, technical optimization for LLM unlearning. The ability to leverage unpaired data for multimodal training has profound implications for resource-efficient AI across diverse domains.
CIAware-Bench addresses a novel and timely problem—whether frontier LLMs can detect control interventions—which is critical for AI safety and alignment. It introduces a new benchmark across multiple task domains and evaluates eleven frontier models, providing broadly useful infrastructure for the community. While Paper 1 makes solid contributions to LLM unlearning with a well-engineered method (NSRU), it is more incremental, combining existing techniques (LoRA, null-space projection) in a specific application area. Paper 2 opens a new research direction with broader implications for AI governance and deployment safety.
Paper 1 proposes a highly innovative latent memory paradigm that addresses a major bottleneck in RAG systems: token consumption and context window limits. By compressing multimodal evidence into single latent tokens, it reduces token usage by 3x-10x while maintaining performance. This offers immense practical value and broader applicability across any resource-constrained LLM/VLM system compared to Paper 2's unlearning method, which, while rigorous and important for AI safety, addresses a slightly more specialized domain.