Nazreen Shah, Govinda Arya, Bharath B. N., Ranjitha Prasad
In many real-world settings, data streams are nonstationary and arrive sequentially, requiring learning systems to adapt continuously without retraining from scratch. Continual learning (CL) addresses this challenge by incorporating new tasks while mitigating catastrophic forgetting, where learning new information degrades performance on previously acquired knowledge. We introduce a control-theoretic perspective on CL that explicitly regulates the evolution of forgetting, framing adaptation as a controlled process subject to long-term stability constraints. We focus on replay-based CL, where a finite memory buffer stores representative samples from prior tasks. We propose COntinual Learning with Drift-Plus-Penalty (COLD), a continual learning framework based on the Drift-Plus-Penalty (DPP) principle from stochastic optimization. To facilitate analysis, we also consider an oracle variant, COLD-ORACLE, as a reference benchmark. At each task, both methods minimize the current task loss while maintaining a virtual queue that tracks deviations from long-term stability on previously learned tasks, capturing the stability-plasticity trade-off as a regulated dynamical process. We establish stability and convergence guarantees that characterize this trade-off through a tunable control parameter. Experiments on standard benchmarks demonstrate that COLD consistently outperforms a broad range of state-of-the-art CL methods while providing competitive and controllable forgetting behavior through explicit regulation of stability and plasticity.
This paper reframes continual learning (CL) as a long-term constrained stochastic control problem, applying the Drift-Plus-Penalty (DPP) framework from Lyapunov optimization to regulate catastrophic forgetting. The key idea is maintaining virtual queues that track cumulative constraint violations (forgetting) on past tasks, transforming the stability-plasticity trade-off into a queue stabilization problem. Two variants are proposed: COLD (using the previous model as reference) and COLD-ORACLE (using the best historical model). The main novelty lies in the explicit, tunable O(1/V) vs. O(V) trade-off between current-task performance and forgetting, governed by a single parameter V, along with theoretical guarantees that depend on task variation measures.
Theoretical Analysis: The paper provides a multi-layered theoretical treatment. Theorem 1 bounds the optimality gap, Theorem 2 bounds the average queue length, and Theorems 3-4 extend results to gradient-based optimization. The authors correctly identify and address several non-trivial departures from standard DPP analysis: endogenous/trajectory-dependent constraints (rather than exogenous stochastic processes), non-stationary loss functions, and the need to benchmark against an idealized CL problem rather than a stationary solution. These are genuine technical challenges.
However, several aspects weaken the rigor:
Experimental Design: Experiments cover standard benchmarks (Split-CIFAR10/100, Split-TinyImageNet, PMNIST) with comprehensive ablations on V, δ, memory size, epochs, and batch size. The toy quadratic experiment convincingly validates the O(1/V) vs. O(V) trade-off. The comparison against 11 baselines is thorough. However, the architectures used (ResNet-18, MLPs) are relatively modest, and the task-incremental setting with known task identities is the easier CL scenario.
Theoretical Impact: The paper provides one of the more principled theoretical frameworks for understanding CL dynamics. The explicit dependence of bounds on task variation measures (D_Φ[T]) is a genuine insight—showing that inter-task variability fundamentally limits CL performance regardless of the algorithm. The virtual queue mechanism offers a novel, algorithmically interpretable forgetting metric that tracks temporal evolution rather than just endpoint degradation.
Practical Impact: The projection-free nature of COLD is a practical advantage over GEM/A-GEM, avoiding feasibility-region shrinkage under high task diversity. The method's simplicity (scalar queue updates, standard gradient steps) makes implementation straightforward. However, the need to tune V and δ, which the paper identifies as limitations, may reduce practical adoption. The competitive but not dramatically superior empirical results (marginal improvements over methods like CBA, DER++, REFRESH) suggest the primary value is theoretical rather than empirical.
Broader Influence: The control-theoretic perspective could influence how the community thinks about CL, potentially inspiring similar formulations for other sequential learning problems. The connection to stochastic network optimization may attract researchers from that community.
The paper addresses a genuinely important problem. Continual learning remains a critical bottleneck for deploying ML systems in non-stationary environments. The lack of principled theoretical frameworks with interpretable trade-offs is a recognized gap. The DPP perspective is timely given growing interest in constrained optimization approaches to CL and the need for methods with formal guarantees beyond per-step heuristics.
Generated Jun 9, 2026
Paper 1 addresses a highly timely and practically impactful problem—improving RL fine-tuning of LLMs, which is central to current AI development. DRPO offers a principled yet practical improvement over PPO/GRPO/DPPO with demonstrated gains across scales and architectures. The LLM post-training space is rapidly growing with enormous industry and research interest, giving this work broad immediate applicability. Paper 2 provides valuable theoretical contributions to continual learning via control theory, but continual learning remains a more niche area with less immediate industrial deployment. Paper 1's direct relevance to the LLM training pipeline gives it higher near-term impact potential.
Paper 1 offers a novel theoretical framework by applying control theory (Drift-Plus-Penalty) to continual learning, providing rigorous stability and convergence guarantees for the stability-plasticity trade-off. This foundational approach has high methodological rigor and potential for broad, lasting impact in ML theory. Paper 2, while highly timely and practical for on-device LLM safety, relies on existing techniques (soft prompts and distillation) and represents an empirical engineering contribution rather than a fundamental scientific breakthrough.
Paper 1 has higher likely impact due to a highly novel systems contribution (agent-driven megakernel synthesis with static schedule validation), strong real-world applicability for LLM inference deployment, and compelling end-to-end evidence (correctness vs HF, broad GPU retargeting, large adversarial validation set, and speedups over strong baselines on widely used inference GPUs). Its impact spans ML systems, compilers, GPU programming, and reliable agentic codegen. Paper 2 is timely and theoretically grounded for continual learning, but similar control/stability framings exist and practical adoption may be narrower than a drop-in inference acceleration/verification harness.
Paper 1 addresses quantum error correction, arguably the most critical bottleneck in realizing scalable quantum computing. Its proposed method offers dramatic, quantitative breakthroughs—a 97% reduction in training time and a 100% success rate—which could significantly accelerate the timeline for practical quantum systems. While Paper 2 provides excellent theoretical rigor for continual learning in AI, Paper 1's combination of extreme performance gains, noise robustness, and direct application to a fundamental hardware-software barrier gives it a higher potential for paradigm-shifting scientific impact.
Paper 1 likely has higher impact due to a broadly applicable, practical framework that removes meta-learning inner loops for neural fields while keeping modality-agnostic generality, yielding large efficiency gains (memory/batch) and strong results across diverse domains (images, 3D, climate) plus downstream tasks—suggesting immediate adoption potential. Paper 2 is methodologically rigorous and timely with theoretical guarantees for continual learning, but its impact may be narrower (replay-based CL setting) and more dependent on uptake of a specific control-theoretic formulation rather than a clear, general-purpose efficiency breakthrough.
Paper 2 likely has higher impact due to a more broadly applicable, principled framework for continual learning with explicit stability–plasticity control, backed by stability/convergence guarantees and empirical gains over strong baselines. The Drift-Plus-Penalty/control-theoretic angle is a novel, timely bridge between stochastic optimization/control and CL, with clear real-world relevance for nonstationary streaming data. Paper 1 offers valuable interpretability for diffusion architectures via an analytically solvable wavelet score parameterization, but its immediate applications are more diagnostic than transformative and may affect a narrower slice of practice.
Paper 2 offers a novel, rigorous theoretical framework bridging control theory and continual learning. By providing stability and convergence guarantees for the stability-plasticity trade-off, it addresses a fundamental and persistent AI challenge (catastrophic forgetting). While Paper 1 is highly timely for LLM compression, Paper 2's foundational contributions are likely to yield a broader and more lasting scientific impact across various machine learning domains.
Paper 1 addresses a critical and highly timely bottleneck in the rapid adoption of large language models: hardware-efficient deployment. By introducing a search-free, hardware-friendly mixed-precision quantization framework with custom GEMM kernels, it offers immediate, highly practical applications for industry and academia. While Paper 2 provides rigorous theoretical foundations for continual learning, Paper 1's direct applicability to the exploding LLM ecosystem suggests a higher potential for widespread near-term scientific impact and citation velocity.
Paper 2 has higher estimated scientific impact due to its broader, more foundational contribution: a control-theoretic framework for continual learning with formal stability/convergence guarantees and a tunable stability–plasticity mechanism. This combination of theory + empirical validation can generalize across architectures, tasks, and domains where nonstationarity matters, influencing both ML theory and practice. Paper 1 is timely and practically valuable for enterprise Text-to-Cypher benchmarking, but its impact is narrower (graph/NL2query evaluation tooling) and more deployment-specific, with less cross-field methodological generality.
Paper 1 introduces a novel control-theoretic foundation for continual learning, a fundamental and pervasive challenge in AI. Its rigorous mathematical approach, providing stability and convergence guarantees via the Drift-Plus-Penalty principle, offers a generalized framework that could broadly influence future theoretical and applied research across various machine learning domains, resulting in a deeper, long-lasting scientific impact compared to the more applied, though timely, LLM alignment focus of Paper 2.