Robust Regression of General ReLUs with Queries

Ilias Diakonikolas, Daniel M. Kane, Mingchen Ma

Jun 9, 2026arXiv:2606.11130v1

cs.LG

#4382of 5669·cs.LG

#4382 of 5669 · cs.LG

Tournament Score

1327±44

10501750

44%

Win Rate

Wins

Losses

Matches

Rating

7.5/ 10

Significance7.5

Rigor8.5

Novelty7.5

Clarity7

Abstract

We study the task of agnostically learning general (as opposed to homogeneous) ReLUs under the Gaussian distribution with respect to the squared loss. In the passive learning setting, recent work gave a computationally efficient algorithm that uses $p o l y (d, 1 / ε)$ labeled examples and outputs a hypothesis with error $O (o p t) + ε$ , where $o p t$ is the squared loss of the best fit ReLU. Here we focus on the interactive setting, where the learner has some form of query access to the labels of unlabeled examples. Our main result is the first computationally efficient learner that uses $d polylog(1/ε)+\tilde{O}(\min\{1/p, 1/ε\})$ black-box label queries, where $p$ is the bias of the target function, and achieves error $O (o p t) + ε$ . We complement our algorithmic result by showing that its query complexity bound is qualitatively near-optimal, even ignoring computational constraints. Finally, we establish that query access is essentially necessary to improve on the label complexity of passive learning. Specifically, for pool-based active learning, any active learner requires $\tildeΩ(d/ε)$ labels, unless it draws a super-polynomial number of unlabeled examples.

AI Impact Assessments

(1 models)

Scientific Impact Assessment

Core Contribution

This paper addresses the label complexity of agnostic learning of general ReLU activations σ(W·x − t) under Gaussian marginals with respect to squared loss. The key distinction from prior work is that it considers general ReLUs (arbitrary bias t, including large positive t), as opposed to the simpler homogeneous case (t ≤ 0). The main contributions are threefold:

1. Upper bound: A computationally efficient algorithm achieving O(opt) + ε error using d·polylog(1/ε) + Õ(min{1/p, 1/ε}) black-box label queries, where p = Φ(t̄*) is the bias of the optimal ReLU.

2. Query lower bound: An information-theoretic lower bound of Ω̃(1/p^{1-o_d(1)} + d) queries, showing the algorithm is qualitatively near-optimal.

3. Active learning lower bound: Pool-based active learners require Ω̃(d/(p·log(m))) labels unless the pool size m is super-polynomial, establishing a sharp separation between query learning and pool-based active learning.

Methodological Rigor

The paper demonstrates considerable technical sophistication across multiple fronts:

Algorithmic framework: The projected gradient descent analysis (Theorem 2.1) for spherical GLMs is elegantly connected to the Ornstein–Uhlenbeck semigroup, yielding a clean integral expression for noiseless error (Lemma 2.1). The initialization-dependent error guarantee parameterized by α is a useful structural insight that cleanly separates convergence analysis from initialization quality.

Gradient boosting via rejection sampling: The technique of querying examples in {x | w^(i)·x > t*} to amplify gradient signal while maintaining bounded variance (Lemma 2.9) is technically clever and well-motivated. This overcomes the fundamental challenge that gradient magnitudes shrink as poly(ε) for large thresholds.

Parameter decomposition: The decoupling of the update into directional (w) and scalar (r, t) components, justified by the noiseless error decomposition (Lemma 3.2), addresses the key difficulty that ‖W − W*‖ does not correctly characterize noiseless error for large thresholds. The careful interleaving of updates—maintaining w within a reasonable range before updating (r,t)—with potential analysis tracking B_i is technically sound.

Lower bounds: The query lower bound combines a standard Ω(d) information-theoretic argument with a novel adversarial noise construction that corrupts labels for high-norm examples. The active learning lower bound is particularly noteworthy, adapting ideas from halfspace classification lower bounds to the continuous-label regression setting, where the richer response structure creates significant technical challenges. The basis-change argument and volume-based counting are well-executed.

Hypothesis selection: Lemma 2.10 introduces a reweighting-based tournament selection procedure using only poly(k) queries, avoiding the naïve Ω(1/ε²) cost—a useful standalone tool.

Potential Impact

Theoretical learning theory: This paper essentially resolves the query complexity of agnostic ReLU regression under Gaussians, establishing matching upper and lower bounds up to polylogarithmic factors. The separation between query learning and pool-based active learning (Theorem 4.2) is a clean structural result of independent interest.

Broader applicability: The spherical GLM framework (Section 2) applies beyond ReLUs to general activation functions, potentially useful for analyzing other single-index models. The hypothesis selection procedure is applicable to general regression problems.

Practical relevance is limited: The algorithm requires exact knowledge of the Gaussian marginal, uses membership queries (stronger than label queries on drawn examples), and the constants hidden in O(opt) may be large. However, the conceptual insights about gradient boosting via conditional sampling could inspire practical active learning strategies.

Timeliness & Relevance

The paper is well-positioned relative to recent developments. The passive learning algorithms of [GV25, ZWDD25] achieved O(opt) + ε error with poly(d, 1/ε) samples, leaving open whether interactive access could reduce label complexity. The query learning framework of [DKM24] for halfspaces provides key building blocks that this paper extends non-trivially to the regression setting. The gap between t < 0 (where d·polylog(1/ε) labels suffice passively) and t > 0 (requiring Ω̃(d/ε) passively) makes the interactive setting a natural and timely question.

Strengths

1. Near-complete picture: The paper provides matching algorithmic and information-theoretic results, plus a separation between learning models—a rare trifecta.

2. Technical depth: The connection to Ornstein-Uhlenbeck semigroups, the two-level grid construction with exponentially smaller grid size than prior work, and the careful entanglement analysis between directional and scalar parameter errors demonstrate genuine technical innovation.

3. Clean modular structure: The warm-up (Section 2) builds intuition effectively before tackling the substantially harder general case (Section 3).

4. Standalone tools: The hypothesis selection procedure (Lemma 2.10) and the GLM learning framework have independent value.

Limitations

1. Query model strength: The algorithm uses membership queries to arbitrary points in ℝ^d, which is stronger than typical active learning. The practical relevance of this oracle is debatable.

2. Constant factors: The error guarantee is O(opt) + ε with unspecified multiplicative constant in O(opt), inheriting hardness barriers from passive learning.

3. Gaussian assumption: Results are specific to Gaussian marginals; extension to broader distribution families is unclear.

4. Gap in lower bound: The query lower bound has 1/p^{1-o_d(1)} versus the upper bound's 1/p, leaving a small gap that depends on dimension.

5. Presentation density: The 46-page paper, while thorough, could benefit from a more streamlined presentation of the general case.

Rating:7.5/ 10

Significance 7.5Rigor 8.5Novelty 7.5Clarity 7

Generated Jun 10, 2026

Comparison History (16)

Wonvs. Space-sampled Value Decay: Forgetting Mechanisms for Non-stationary Deep Reinforcement Learning

Paper 2 has higher potential impact due to stronger novelty and rigor: it provides the first computationally efficient interactive (query-based) learner for agnostic learning of general ReLUs under Gaussians with improved (and near-optimal) query complexity, plus lower bounds clarifying necessity of queries. These theoretical guarantees are broadly relevant across learning theory, active learning, and robustness. Paper 1 proposes a practical forgetting heuristic for non-stationary deep RL and shows empirical benefits, but the contribution appears more incremental, with narrower scope and less formal grounding, likely limiting cross-field impact.

gpt-5.2·Jun 11, 2026

Wonvs. When Do Autoregressive Sequence Models Forecast Physical Wavefields? A Controlled Study on Synthetic Seismograms

Paper 2 has higher potential impact due to a clearer theoretical breakthrough: the first computationally efficient robust learner for general ReLUs in an interactive/query setting with near-optimal query complexity, plus complementary lower bounds showing necessity of queries for improvements over passive learning. This advances core learning theory (active/interactive learning, agnostic regression, complexity separations) with broad relevance across theoretical CS and foundations of ML. Paper 1 is a careful, useful controlled ablation study for autoregressive wavefield forecasting, but its contributions are more incremental and domain-specific, with narrower cross-field impact.

gpt-5.2·Jun 10, 2026

Wonvs. Pre-AF 13: An Interpretable Atrial Fibrillation Risk Score Mined from Discharge Reports

Paper 2 offers a foundational theoretical advancement in machine learning by establishing near-optimal query complexity bounds for learning general ReLUs. While Paper 1 presents a highly relevant clinical application, its single-center retrospective design limits its generalizability. Paper 2's contributions to algorithmic theory and active learning have broader potential impact, as improvements in training fundamental neural network components cascade across all disciplines that rely on machine learning, whereas Paper 1's impact is relatively confined to cardiovascular risk stratification.

gemini-3.1-pro-preview·Jun 10, 2026

Wonvs. Data-Driven Runway and Taxiway Exits Prediction of Landing Aircraft: A Case Study at Hartsfield-Jackson Atlanta International Airport

Paper 1 makes fundamental theoretical contributions to learning theory by establishing near-optimal query complexity bounds for agnostically learning general ReLUs, a core problem in computational learning theory with broad implications for machine learning foundations. It provides both algorithmic results and matching lower bounds, demonstrating methodological rigor and novelty. Paper 2 is a well-executed but incremental applied study using standard ML classifiers for a narrow airport operations problem at a single airport, with limited generalizability and modest performance gains, resulting in lower potential for broad scientific impact.

claude-opus-4-6·Jun 10, 2026

Wonvs. Generalized TV--$\ell_p$ Structured Priors for Bayesian $T_1$ Mapping

Paper 1 addresses a fundamental problem in computational learning theory—agnostically learning ReLUs with near-optimal query complexity—providing both upper and lower bounds. Its contributions advance theoretical understanding of active/query learning for neural network components, with broad implications for machine learning theory. Paper 2 presents a useful but incremental extension (TV-ℓp priors) for Bayesian T1 mapping in MRI, a narrower application domain. Paper 1's novelty in establishing tight complexity bounds and the separation between passive and query learning models gives it broader theoretical impact across multiple fields.

claude-opus-4-6·Jun 10, 2026

Lostvs. A Sliced-Wasserstein Framework on Correlation Matrices for EEG Decoding

Paper 2 offers a clear, direct application to real-world healthcare and neuroscience challenges (EEG decoding), giving it a broader interdisciplinary impact. While Paper 1 provides foundational theoretical contributions to machine learning, Paper 2 introduces a novel geometric framework combined with practical domain generalization, supported by experimental validation and open-source code, which is likely to drive immediate adoption and cross-field utility.

gemini-3.1-pro-preview·Jun 10, 2026

Wonvs. Titans-as-a-Layer: Test-Time Memory for Conversational Speech Emotion Recognition

Paper 2 has higher likely scientific impact: it delivers first computationally efficient interactive (query-based) agnostic learning algorithm for general ReLUs with strong query complexity, plus near-optimality and lower bounds showing necessity of queries. This is a substantial theoretical advance with broadly relevant techniques for learning theory, active learning, and robust regression. Paper 1 is a useful applied contribution for conversational SER via a plug-and-play test-time memory adapter, but its impact is narrower (SER-focused) and the core idea (memory adapters) is closer to existing trends in LLM augmentation.

gpt-5.2·Jun 10, 2026

Lostvs. Machine-Learning Emulation of Satellite Greenhouse Gas Retrievals: Stability over Time

Paper 1 addresses a highly timely and critical global issue: monitoring greenhouse gases. By improving the computational efficiency and temporal stability of satellite retrievals, its findings have broad, immediate real-world applications in climate science and remote sensing. Paper 2 offers significant theoretical advancements in machine learning, but its impact is largely confined to theoretical computer science, whereas Paper 1 bridges applied ML with urgent environmental challenges, giving it a broader potential impact.

gemini-3.1-pro-preview·Jun 10, 2026

Lostvs. EEVEE: Towards Test-time Prompt Learning in the Real World for Self-Improving Agents

Paper 1 likely has higher scientific impact due to strong timeliness and broad real-world applicability: test-time adaptation and prompt learning for LLM agents across heterogeneous task streams is a central, fast-moving area with immediate deployment relevance. Its multi-dataset setting and router/prompt co-evolution address practical interference issues and show large empirical gains, suggesting near-term uptake across AI systems and applications. Paper 2 is methodologically rigorous and novel in theory (query-efficient agnostic learning of general ReLUs), but its impact is narrower (specialized distributional assumptions and a more theoretical audience), with less direct applicability to current dominant ML practice.

gpt-5.2·Jun 10, 2026

Lostvs. OncoTraj: a public benchmark for longitudinal resistance prediction in EGFR-mutant non-small-cell lung cancer on osimertinib

While Paper 1 provides rigorous theoretical advancements in machine learning, Paper 2 introduces a highly relevant public benchmark in precision oncology. By harmonizing clinical-genomic data for cancer resistance prediction, Paper 2 offers immediate real-world applications, addresses a critical gap in translational medicine, and is likely to catalyze widespread follow-up research across both the computational biology and medical communities.

gemini-3.1-pro-preview·Jun 10, 2026

#4382of 5669·cs.LG

#4382 of 5669 · cs.LG

Tournament Score

1327±44

10501750

44%

Win Rate

Wins

Losses

Matches

Rating

7.5/ 10

Significance7.5

Rigor8.5

Novelty7.5

Clarity7