Capacity-Constrained Online Convex Optimization with Delayed Feedback

Alexander Ryabchenko, Idan Attias, Daniel M. Roy

Jun 10, 2026arXiv:2606.11711v1

cs.LGstat.ML

#3594of 5669·cs.LG

#3594 of 5669 · cs.LG

Tournament Score

1370±43

10501750

40%

Win Rate

Wins

Losses

Matches

Rating

6.8/ 10

Significance7

Rigor8

Novelty7

Clarity7.5

Abstract

Online learning with delayed feedback typically assumes that the learner can track all pending rounds until their feedback arrives. In practice, tracking resources are finite, and feedback from untracked rounds is permanently lost. In this paper, we study delayed online convex optimization (OCO) under a hard capacity constraint, where at most $C$ pending rounds can be tracked at any time. To model delay information, we introduce a semi-clairvoyant model that refines the clairvoyant assumption from prior work: rather than requiring delays to be known at prediction time, the learner observes delay expirations online, consistent with the classical unconstrained delayed setting. Our approach proceeds via a reduction to a novel ``delayed and weighted'' OCO problem, using a scheduler that randomizes tracking decisions and importance-weights the resulting observations. For this base problem, we propose and analyze Delayed-Weighted FTRL and its bandit analogue, establishing regret bounds that explicitly characterize the interaction between time-varying weights and delayed feedback. Combining these base learners with our schedulers yields the first regret guarantees for capacity-constrained OCO under convex and strongly convex losses, for both first-order and bandit feedback. For first-order feedback, capacity $C = Ω(\log T)$ suffices to recover standard delayed OCO rates up to logarithmic factors. For bandit feedback, the regret rates are modulated by powers of $(1 + σ_{\text{max}}/C)$ , where $σ_{\text{max}}$ is the maximum number of pending observations at any time. This allows the regret bound to degrade gracefully when $C < σ_{\text{max}}$ , while remaining sublinear.

AI Impact Assessments

(1 models)

Scientific Impact Assessment

Core Contribution

This paper introduces and solves the problem of online convex optimization (OCO) and bandit convex optimization (BCO) under a hard capacity constraint on the number of simultaneously tracked pending observations. The key insight is that standard delayed online learning implicitly assumes unlimited tracking capacity — an assumption violated in many practical settings (clinical trials, resource-constrained systems). The paper extends the capacity-constrained framework of Ryabchenko, Attias, and Roy (2025), which handled finite-action settings (experts and multi-armed bandits), to continuous convex domains.

The technical approach involves a clean two-part decomposition: (1) a novel "delayed and weighted" OCO/BCO framework with algorithms (DW-FTRL and DW-FTBL), and (2) proxy-delay schedulers that reduce the capacity-constrained problem to this weighted problem via importance sampling. A key methodological contribution is the semi-clairvoyant delay model, which only reveals delays upon expiration rather than at prediction time — a more realistic assumption that aligns with standard delayed feedback settings.

Methodological Rigor

The theoretical development is thorough and carefully structured. Several aspects stand out:

Delayed-weighted regret analysis: The regret bounds for DW-FTRL (Theorem 3.1) and DW-FTBL (Theorem 3.2) cleanly characterize how time-varying importance weights interact with delayed feedback through pairwise terms $w_t \sum_{s \in B_t} w_s$ . The bandit case introduces additional $\ell_2$ cross-terms $({\sum_{s \in B_t} w_s^2})^{1/2}$ from variance accumulation of pending gradient estimators — a genuine technical challenge that the authors handle via Lemma C.1 (adapted from RAR26).

Scheduler design: The Chernoff-based saturation bound (Lemma 4.3) provides a clean sufficient condition for tracking set overflow probability $\leq e^{-C}$ . Two concrete schedulers — Pareto (adapted from prior work) and the novel Bernoulli scheduler — sit at different points of an adaptivity/uniformity tradeoff. The Bernoulli scheduler's uniform weights are essential for strongly convex and bandit guarantees, representing a genuine design innovation.

Doubling trick: The paper addresses the practical concern that $\sigma_{\max}$ may be unknown via a clean doubling trick (Appendix F) that preserves the asymptotic rates up to logarithmic factors.

The proofs are complete and self-contained in the appendices. The reduction structure is modular, making the analysis tractable and verifiable.

Potential Impact

Theoretical significance: The results demonstrate a surprising efficiency threshold: for first-order OCO, logarithmic capacity $C = \Omega(\log T)$ suffices to recover standard unconstrained delayed rates up to log factors. This is a strong positive result suggesting that even very limited tracking resources can preserve near-optimal learning. For BCO, the graceful degradation through factors of $(1 + \sigma_{\max}/C)^{1/4}$ (convex) and $(1 + \sigma_{\max}/C)^{1/3}$ (strongly convex) provides a clean quantitative picture of how capacity constraints affect learning.

Practical relevance: Capacity constraints naturally arise in clinical trials, A/B testing with limited monitoring infrastructure, and distributed systems. The connection to finite-buffer queueing systems (where jobs finding a full buffer are dropped) provides natural modeling motivation. The paper's use of $\sigma_{\max}$ (peak concurrency) rather than $d_{\max}$ (maximum delay) as the key complexity parameter is well-motivated for queueing-theoretic applications.

Broader framework value: The delayed-weighted OCO/BCO framework (Section 3) extends scale-free online learning to delayed settings and may find independent applications beyond the capacity-constrained setting. The proxy-delay scheduling framework is parameterized by general distributions, allowing future instantiations.

Timeliness & Relevance

The paper addresses a genuine practical gap in the online learning literature. Delayed feedback has received sustained attention, but the implicit assumption of unlimited tracking capacity has gone largely unquestioned until the recent work of RAR25. Extending this to continuous domains is a natural and timely next step, particularly given growing interest in resource-constrained learning systems. The connection between online learning and queueing theory highlighted by the authors represents a potentially fruitful research direction.

Strengths

1. Clean modular architecture: The scheduler/base-learner decomposition enables independent improvement of either component.

2. Comprehensive coverage: Results span four settings (convex/strongly convex × first-order/bandit), with explicit capacity dependence throughout.

3. Sharp thresholds: The $C = \Omega(\log T)$ sufficiency result for first-order OCO is elegant and practically encouraging.

4. Semi-clairvoyant model: Weakening the clairvoyant assumption from prior work without sacrificing rate quality is meaningful.

5. Graceful degradation: BCO bounds remain sublinear even when $C \ll \sigma_{\max}$ , avoiding catastrophic failure.

Limitations

1. No lower bounds: The paper establishes no information-theoretic lower bounds for capacity-constrained OCO/BCO. It is unclear whether the capacity-dependent factors $(1 + \sigma_{\max}/C)^{1/4}$ and $(1 + \sigma_{\max}/C)^{1/3}$ are tight.

2. Gap in BCO convex case: The delay-dependent term scales as $\sqrt{T\sigma_{\max}}$ rather than the optimal $\sqrt{d_{\text{tot}}}$ from unconstrained settings. The authors acknowledge this gap but leave it open.

3. Oblivious adversary only: The framework assumes an oblivious adversary; extension to adaptive adversaries is left as future work and would require fundamentally different techniques.

4. No experiments: The paper is purely theoretical. Empirical validation on practical scenarios (e.g., clinical trial simulations) would strengthen the contribution.

5. Knowledge requirements: While the doubling trick removes dependence on $\sigma_{\max}$ , some bounds still require knowledge of $d_{\text{tot}}$ or $T$ , handled via standard but cumbersome doubling.

Overall Assessment

This is a solid theoretical contribution that opens up the study of capacity-constrained online convex optimization. The results are the first of their kind, the technical machinery is sound and novel (particularly the delayed-weighted framework and Bernoulli scheduler), and the problem formulation is well-motivated. The main limitation is the absence of matching lower bounds, which prevents a complete characterization of the capacity-regret tradeoff. The paper makes meaningful progress on a practical and theoretically interesting problem.

Rating:6.8/ 10

Significance 7Rigor 8Novelty 7Clarity 7.5

Generated Jun 11, 2026

Comparison History (20)

Lostvs. Quantizing Time-Series Models As Dynamical Systems: Trajectory-Based Quantization Sensitivity Score

Paper 1 introduces a novel interdisciplinary framework connecting dynamical systems theory to neural network quantization, addressing a practical and timely problem in deploying models on resource-constrained devices. Its applicability to black-box and compiled networks with fused operators broadens its real-world impact significantly. The decoupling of sensitivity analysis from quantizer selection is a conceptual innovation with broad utility. Paper 2 makes solid theoretical contributions to online learning with capacity constraints, but addresses a more niche setting with primarily theoretical interest and narrower audience appeal.

claude-opus-4-6·Jun 12, 2026

Lostvs. Clipping Makes Distributed and Federated Asynchronous SGD Robust to Stragglers

Paper 1 addresses a highly practical and widely used setting (distributed and federated asynchronous SGD) and provides a strong theoretical foundation for an empirically successful technique (gradient clipping). Its direct relevance to scaling modern large-scale deep learning gives it higher potential for real-world application and broader impact compared to Paper 2, which focuses on a more specialized theoretical niche in online convex optimization.

gemini-3.1-pro-preview·Jun 12, 2026

Lostvs. CRAFTIIF: Cross-Resolution Analytic Four-Type Interpretable Isolation Forest for Multivariate Time Series Anomaly Detection

Paper 1 offers a highly practical, interpretable solution to multivariate time series anomaly detection, a ubiquitous problem across finance, healthcare, and IoT. Its comprehensive approach to multiple anomaly types and significant empirical performance improvements (+40.7% on VUS-PR) suggest immediate, broad real-world applicability. In contrast, Paper 2 provides valuable but highly specialized theoretical advancements in online convex optimization, which likely has a narrower, more constrained impact primarily within the theoretical machine learning community.

gemini-3.1-pro-preview·Jun 12, 2026

Wonvs. Uncertainty Estimation for Molecular Diffusion Models

Paper 1 addresses a fundamental and novel problem in online learning theory—capacity-constrained delayed feedback—introducing new theoretical frameworks (semi-clairvoyant model, delayed-weighted OCO reduction) with rigorous regret guarantees. It opens a new research direction with broad implications for resource-constrained learning systems. Paper 2 proposes a useful but incremental post-hoc uncertainty estimation method for molecular diffusion models. While practically relevant, it applies existing techniques (Laplace approximation) to a specific domain. Paper 1's theoretical contributions have broader impact across optimization, learning theory, and multiple application areas.

claude-opus-4-6·Jun 12, 2026

Lostvs. MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling

Paper 1 likely has higher impact due to its striking empirical leap in automated theorem proving (gold-medal-level IMO/USAMO performance) and a generally applicable test-time scaling paradigm (generator/verifier/refiner with population search and tournament selection). This is timely and broadly relevant to AI reasoning, formal methods, and verification, with clear real-world applications (reliable proof generation, program verification). Paper 2 is methodologically rigorous and novel within online learning theory, but its impact is narrower (specialized OCO with capacity constraints) and less likely to drive cross-field adoption compared to the high-visibility breakthrough in Paper 1.

gpt-5.2·Jun 12, 2026

Wonvs. ERBench: A Benchmark and Testsuite for Equation Discovery Algorithms

Paper 2 introduces a novel theoretical framework for capacity-constrained online convex optimization with delayed feedback, addressing a practical gap in online learning theory. It provides new algorithmic contributions (Delayed-Weighted FTRL), a refined semi-clairvoyant delay model, and rigorous regret bounds that gracefully degrade with capacity constraints. This advances fundamental optimization theory with broad applicability. Paper 1, while useful as a benchmarking tool for symbolic regression, is more incremental—extending existing benchmarks with robustness evaluations. Benchmarks can be impactful but typically have narrower theoretical contribution compared to foundational algorithmic advances with provable guarantees.

claude-opus-4-6·Jun 11, 2026

Wonvs. MDP-GRPO: Stabilized Group Relative Policy Optimization for Multi-Constraint Instruction Following

Paper 2 introduces a fundamentally new theoretical framework for online convex optimization with capacity constraints and delayed feedback—a novel problem formulation with broad applicability across online learning. It provides rigorous regret bounds, a new semi-clairvoyant model, and a reduction-based approach with clear theoretical contributions. Paper 1, while addressing practical issues in RLHF/GRPO for instruction following, offers more incremental engineering improvements (5% gains on one model) combining known techniques (prospect theory, temperature sampling). Paper 2's theoretical foundations are more likely to spawn follow-up work across multiple subfields of optimization and learning theory.

claude-opus-4-6·Jun 11, 2026

Wonvs. CoMetaPNS: Continually Meta-learning Personalized Neural Surrogates for Cardiac Electrophysiology Simulations

Paper 2 has higher potential impact due to broader applicability and stronger methodological rigor: it introduces a new capacity-constrained delayed-feedback OCO model aligned with real resource limits, develops a general reduction plus new Delayed-Weighted FTRL (and bandit analogue), and proves first regret guarantees with graceful degradation in both first-order and bandit settings. These results are timely for large-scale/streaming systems and can influence multiple fields (online learning, control, networking, recommender systems). Paper 1 is promising but appears validated mainly on synthetic cardiac data, with narrower domain scope.

gpt-5.2·Jun 11, 2026

Lostvs. Towards Graph Foundation Models for Dynamics in Complex Networked Systems: Lessons from Super-Spreader Identification in Multilayer Networks

Paper 1 addresses the timely and high-impact topic of Graph Foundation Models (GFMs) for network dynamics, proposing a framework for inductive cross-network generalization. This connects to the rapidly growing foundation model paradigm and has broad applications across epidemiology, social networks, and complex systems. While Paper 2 makes solid theoretical contributions to online convex optimization with capacity constraints, it addresses a more niche problem with narrower audience. Paper 1's vision for GFMs in network dynamics, combined with demonstrated zero-shot generalization, positions it for broader interdisciplinary impact and future research directions.

claude-opus-4-6·Jun 11, 2026

Wonvs. Categorical Prior Lock-in: Why In-Context Learning Fails for Structured Data

Paper 1 introduces a novel and rigorous theoretical framework for online convex optimization under realistic capacity constraints with delayed feedback—a previously unstudied but practically important setting. It provides formal regret guarantees, a new reduction technique, and characterizes fundamental tradeoffs between capacity and delay. Paper 2 identifies an interesting failure mode (categorical prior lock-in) in LLM in-context learning for structured data, but is more empirical and narrower in scope. Paper 1's theoretical contributions have broader applicability across online learning, optimization, and systems with resource constraints, giving it higher long-term impact potential.

claude-opus-4-6·Jun 11, 2026

#3594of 5669·cs.LG

#3594 of 5669 · cs.LG

Tournament Score

1370±43

10501750

40%

Win Rate

Wins

Losses

Matches

Rating

6.8/ 10

Significance7

Rigor8

Novelty7

Clarity7.5