Xiaoli Yu, Jiamiao Liu
Selective predictors answer on confident inputs and abstain elsewhere; deploying one safely needs a single finite-sample certificate that simultaneously upper-bounds the selected risk, lower-bounds the acceptance probability above a floor , and lower-bounds the deployment utility. This certificate must be valid under adaptive threshold selection from a finite grid of pairs on samples. We give such a certificate for bounded, possibly non-monotone losses by treating the selected risk directly as a ratio rather than through a Hoeffding-style range bound. The construction couples three confidence bounds: a variance-adaptive empirical-Bernstein bound on the ratio risk, a Clopper--Pearson bound on acceptance, and a two-sided closeness bound on utility. Together they lower-bound the certified policy's utility absolutely and to within of the best over the \emph{certified set}, both non-vacuous whenever feasible; a regime-scoped third leg matches an external oracle, informative only where the risk margin and vacuous at the headline operating points. Relative to the range-only Hoeffding-ratio construction this sharpens the acceptance-floor dependence from to , and a closed-form corollary identifies a per-pair regime in which our risk bound dominates a Hoeffding conformal risk control (Hoeffding--CRC) selective bound. Empirically, on ImageNet (three ResNets) and COCO val 2017 panoptic, the certificate opens a pp certified-acceptance frontier over Hoeffding--CRC and is tighter than a non-vacuous matched-valid baseline; these gains are regime-scoped, not universal, and absent on ADE20K. The certifier runs in time.
The paper addresses a genuine gap in the conformal prediction / selective prediction literature: no prior work provides a single finite-sample certificate that simultaneously guarantees (a) the selected risk , (b) acceptance probability , and (c) a deployment utility lower bound , all valid under adaptive two-threshold selection from a finite grid. The key technical insight is treating the selected risk directly as a ratio rather than through a Hoeffding-style range bound, coupling three concentration inequalities: an empirical-Bernstein bound on the numerator, a Clopper–Pearson bound on the denominator (acceptance), and a Maurer–Pontil bound on utility. This coupling, combined with a deterministic-eligibility H-set union argument, yields an acceptance-floor dependence of versus for the range-only Hoeffding construction.
The paper is exceptionally rigorous in its theoretical development. The proofs are presented in full detail within the main text rather than deferred to appendices, which is unusual and commendable. Key strengths include:
One concern is the sample-size condition , which can be demanding at low acceptance floors. The paper acknowledges this but it limits practical applicability.
Direct applications: Safety-critical deployment of selective prediction systems (medical triage, autonomous driving perception, content moderation) where operators need simultaneous guarantees on error rates, system availability, and cost-effectiveness. The certificate provides exactly the deployment-level object a practitioner needs.
Methodological influence: The variance-adaptive ratio treatment and the H-set deterministic-eligibility argument could influence other post-selection inference problems. The regime-separation corollary (Theorem 9) provides practitioners with a closed-form diagnostic for choosing between methods.
Limitations on impact: The i.i.d. assumption and bounded-loss requirement restrict applicability. The paper honestly notes that distribution-shift robustness is not claimed, and heavy-tailed losses are left to future work. The gains are explicitly regime-scoped—absent on ADE20K and in high-variance/low-sample regimes.
The paper is well-timed. Conformal prediction has seen explosive growth, with recent extensions to non-monotone losses [8,9], selective CRC [6], and e-value selective prediction [7]. The gap this paper fills—joint certification of the full deployment triplet—is practical and recognized. The positioning table (Table 1) and related work section are thorough and fair, explicitly noting what other methods *could* potentially be extended to cover rather than claiming impossibility.
This is a technically strong, carefully executed contribution that fills a real gap in the selective conformal prediction literature. The combination of theoretical depth, honest scoping, and thorough empirical evaluation sets a high standard. The impact is somewhat limited by the regime-specificity of the gains and the practical demands of the sample-size condition, but the core joint-certificate construction is a genuine advance for deployable selective prediction.
Generated Jun 9, 2026
Paper 1 addresses the highly timely and critical problem of safely deploying machine learning models via conformal risk control. By providing tighter finite-sample certificates for adaptive selective prediction and demonstrating strong empirical gains on standard vision benchmarks, it offers immediate practical utility for trustworthy AI. While Paper 2 presents excellent theoretical results on an important learning theory problem, Paper 1's combination of rigorous statistical guarantees and direct applicability to modern ML deployment suggests a broader and more immediate scientific impact across both theory and practice.
Paper 1 bridges reinforcement learning and conversational AI, a highly timely intersection given the rise of interactive LLM agents. By incorporating proactive conversational queries into multi-objective bandits, it offers a novel, highly applicable solution to personalized recommendation systems. While Paper 2 provides rigorous theoretical advancements for conformal risk control, Paper 1 has broader appeal, more immediate real-world applications across user-facing AI systems, and aligns perfectly with current trends in human-AI interaction.
Paper 2 likely has higher impact: it analyzes a broadly relevant optimizer variant (polar-factor updates) with theoretical characterization (entropy-maximizing bias, exact spectral dynamics) and links to practical deep-learning regimes, potentially influencing optimization theory and algorithm design across many models. Its core concept generalizes beyond selective prediction to training dynamics, with timely relevance to LLM/Vision training. Paper 1 is methodologically solid and practically useful for certified selective conformal deployment, but its impact is more specialized to conformal risk control and selective prediction settings.
Paper 1 addresses the universally critical challenge of high training costs in machine learning through data pruning. By offering a plug-and-play framework with strong theoretical guarantees (unbiasedness, convergence) and demonstrating over 40% cost reduction on major datasets without performance loss, it presents a highly practical and broadly applicable solution. Paper 2's focus on conformal risk control for selective predictors is mathematically rigorous and important for AI safety, but its applications are more specialized, making Paper 1 likely to have a broader and more immediate impact across the field.
Paper 2 likely has higher scientific impact due to strong real-world applicability and timeliness: KV-cache compression and inference acceleration are central bottlenecks for LLM deployment, and the reported gains (up to 20× KV reduction, 3.1× throughput) plus open-source kernels enable rapid adoption across industry and research. Its contributions (adaptive rank control, hybrid decomposition, quantization, Triton kernels) affect systems, ML efficiency, and model serving broadly. Paper 1 is methodologically rigorous and novel in selective conformal certification, but its impact is narrower (selective risk control) and more regime-dependent empirically.
Paper 2 addresses a broadly relevant problem—fine-tuning large time series models—with a simple, practical, and generalizable method (SFF) applicable across eight major models and diverse tasks. Its novelty in smoothing non-convex loss landscapes via weight interpolation has wide applicability beyond time series to other foundation model domains. Paper 1, while rigorous and technically strong, targets a narrower niche (conformal risk control certificates for selective prediction) with more limited audience and applicability, and its gains are explicitly regime-scoped and not universal. Paper 2's breadth of impact and timeliness in the foundation model era give it higher potential impact.
Paper 2 has higher potential impact because it challenges fundamental assumptions in a widely active field (EEG/BCI deep learning), demonstrating that current benchmarks are saturated and that standard reconstruction metrics don't predict downstream utility. This has broad methodological implications for how the entire EEG denoising community evaluates progress, potentially redirecting research efforts. It also offers practical value (ultra-compact models for edge deployment). Paper 1, while technically rigorous, addresses a narrower statistical certification problem with gains that are regime-scoped and not universal, limiting its broader influence.
Paper 2 likely has higher impact: it provides a rigorous, finite-sample, adaptivity-valid certificate for selective prediction with concrete theoretical improvements (e.g., tightening dependence from 1/p_min to 1/sqrt(p_min)) and broad applicability across ML deployment/safety settings. It couples multiple statistically principled bounds, offers complexity guarantees, and demonstrates gains on major benchmarks (ImageNet, COCO). Paper 1 addresses an important quantum problem but appears more incremental/empirical, with impact constrained by near-term quantum hardware and limited evidence of generality beyond the reported setup.
Paper 1 addresses the fundamental and highly active challenge of scalable multitask reinforcement learning, demonstrating that representation learning—not model-based planning—is the key driver of performance. This insight simplifies architectures, reduces computational overhead, and has broad implications across RL applications (robotics, control, game AI). Its clear, actionable finding that model-free methods with predictive representations can outperform complex world-model-based approaches is likely to influence a large research community. Paper 2 makes a solid but narrower contribution to conformal prediction theory with improvements in specific statistical regimes, limiting its breadth of impact.
Paper 2 presents a rigorous theoretical contribution with finite-sample certificates for selective conformal prediction, offering provable guarantees with clear mathematical improvements (1/√pmin vs 1/pmin dependence). It addresses fundamental problems in safe AI deployment with broad applicability across classification and segmentation tasks. Paper 1, while addressing a practical SER problem, is more incremental—applying an existing memory mechanism (Titans) as an adapter to existing audio LLMs. Paper 2's methodological rigor, theoretical novelty, and relevance to trustworthy ML give it broader and deeper potential impact.