Atif Hassan, Swanand Khare, Jiaul H. Paik
Dynamic data pruning techniques aim to reduce computational cost while minimizing information loss by periodically selecting representative subsets of input data during model training. However, existing methods often struggle to maintain strong worst-group accuracy, particularly at high pruning rates, across balanced and imbalanced datasets. To address this challenge, we propose RCAP, a Robust, Class-Aware, Probabilistic dynamic dataset pruning algorithm for classification tasks. RCAP applies a closed-form solution to estimate the fraction of samples to be included in the training subset for each individual class. This fraction is adaptively adjusted in every epoch using class-wise aggregated loss. Thereafter, it employs an adaptive sampling strategy that prioritizes samples having high loss for populating the class-wise subsets. We evaluate RCAP on six diverse datasets ranging from class-balanced to highly imbalanced using five distinct models across three training paradigms: training from scratch, transfer learning, and fine-tuning. Our approach consistently outperforms state-of-the-art dataset pruning methods, achieving superior worst-group accuracy at all pruning rates. Remarkably, with only data, RCAP delivers improvement in performance on class-imbalanced datasets compared to full data training while providing an average speedup. The code can be accessed at https://github.com/atif-hassan/RCAP-dynamic-dataset-pruning
RCAP introduces a dynamic dataset pruning algorithm that explicitly addresses worst-group (class-wise) accuracy, a metric largely overlooked by prior dynamic pruning methods. The paper makes two interlocking contributions: (1) a closed-form solution (Theorem 3.1) for determining per-class subset sizes based on class-wise aggregated loss, derived via Lagrange optimization; and (2) a softmax-temperature-based adaptive sampling strategy that probabilistically selects high-loss samples within each class. The key insight is that classes with higher cumulative loss should receive proportionally more representation in the training subset, which naturally counteracts class imbalance and varying class difficulty.
The problem addressed — maintaining robustness (worst-group accuracy) under aggressive data pruning — is practically relevant but has not been the primary focus of prior dynamic pruning work. Most existing methods (InfoBatch, RS2) optimize for average accuracy, which can mask severe performance degradation on minority or hard classes.
Theoretical Foundation: Theorem 3.1 provides a clean, closed-form solution for optimal class-wise retention fractions under the assumption of full-batch gradient descent. The derivation is straightforward Lagrange multiplier optimization of the total empirical error with a budget constraint. However, the assumption of full-batch gradient descent is a significant gap — practical training uses mini-batch SGD, and the theorem's optimality guarantees do not directly transfer. The paper acknowledges using epoch-t statistics as a proxy for epoch t+1 and cites precedent for this history-based approach, but doesn't formally bound the approximation error.
The connection between minimizing the right-hand side of Equation 10 and selecting high-loss samples relies on an empirical observation (monotonic relationship between loss and gradient norm), supported only by a toy experiment on 90 examples (Figure 3 in supplementary). This is a weak link — the monotonic relationship is not proven for general architectures or loss landscapes, and the toy example is far from representative of realistic training scenarios.
Experimental Design: The evaluation is comprehensive: six datasets (balanced to extremely imbalanced), five architectures, three training paradigms, four pruning rates, and seven baselines. Results are averaged over three seeds with standard deviations reported. The breadth of evaluation is a clear strength. The claim that static methods are evaluated in their "best-case scenario" (same architecture for pruning and training) is a fair point that strengthens the comparison.
Potential Concerns: The β hyperparameter requires per-dataset, per-pruning-rate tuning (Table 4 shows different values across all settings), which somewhat undermines the claim of simplicity. The ablation study (Figure 2) reveals that performance is sensitive to β, especially at high pruning rates, which could limit practical usability. The handling of the αj > 1 boundary condition (Equation 5) through redistribution is pragmatic but ad-hoc.
RCAP addresses a genuine need in deploying deep learning models fairly: ensuring that computational savings from data pruning don't disproportionately harm underrepresented classes. This is relevant for applications in medical imaging, autonomous driving, and any domain where class imbalance is inherent and worst-case performance matters.
The reported results are striking — on class-imbalanced datasets, RCAP with only 10% data outperforms full-data training in worst-group accuracy (e.g., iNaturalist: 68.49% vs 69.66% at 90% pruning, CelebA: 91.24% vs 90.14%). The 8.69× speedup is practically significant. The O(1) per-sample overhead claim (since everything is computed from forward-pass losses) makes integration into existing pipelines straightforward.
However, the scope is limited to classification tasks. Extension to other tasks (detection, segmentation, generation) is non-trivial and not discussed. The limitation regarding LLMs is acknowledged but leaves a significant gap given current research priorities.
The paper addresses a timely intersection of two important trends: (a) reducing training costs for deep learning, and (b) ensuring fairness/robustness across subgroups. The emphasis on worst-group accuracy aligns with growing awareness of algorithmic fairness. Dynamic pruning remains an active research area, and incorporating class awareness is a natural and needed extension.
That said, the paper's focus on image classification with relatively standard architectures (ResNet, EfficientNet) may feel incremental given the field's shift toward foundation models and LLMs. The ImageNet experiment using a frozen DINOv2 backbone with a 2-layer MLP is more of a transfer learning evaluation than a true large-scale training experiment.
The comparison landscape is fair but could be strengthened. The paper compares against methods published through 2024, which is appropriate. However, some results raise questions — e.g., UCB producing random outputs on 4/6 datasets suggests possible implementation issues rather than fundamental method failure. The paper could also benefit from analysis of the learned α distributions over training to provide intuition about the method's behavior.
Generated Jun 11, 2026
Paper 1 offers a highly practical solution to reduce computational costs in model training, a critical issue in modern ML. It provides extensive empirical validation across multiple datasets, models, and paradigms, reporting concrete, significant improvements (8.69x speedup, better accuracy with 10% data) and includes open-source code. Paper 2 presents an interesting theoretical concept but lacks the quantitative evidence and breadth of evaluation demonstrated in Paper 1.
Paper 2 likely has higher scientific impact due to broader applicability and timeliness: robust dynamic dataset pruning affects training efficiency across many models, datasets, and deployment settings, with clear speedups and accuracy gains (including worst-group accuracy). Its method is generally usable in standard ML pipelines and supported by extensive evaluation and released code. Paper 1 is novel but more niche—focused on memristor-based analog ASR and ADC/positional encoding interactions—impactful for specialized hardware-ML co-design but narrower in scope and immediate adoption.
Paper 2 addresses a critical gap in molecular diffusion models—uncertainty quantification—which has broad implications for drug discovery, materials science, and generative modeling. The method is principled (Laplace approximation), post-hoc (applicable to pretrained models), and enables practical test-time filtering. This combines timeliness (diffusion models are rapidly growing) with high real-world impact in computational chemistry. Paper 1, while solid and practical, represents an incremental improvement in dataset pruning with class-awareness, a more mature and narrower subfield with less transformative potential.
Paper 2 likely has higher scientific impact: it proposes a novel, training-free modification to classifier-guided diffusion sampling to explore low-density (tail) regions, directly addressing a timely and widely relevant limitation of diffusion models. The approach could be broadly adopted across generative modeling tasks and domains (images, audio, etc.) with minimal overhead, and aligns with strong current interest in controllable sampling, coverage/recall, and guidance methods. Paper 1 is practical and solid for efficient training and robustness, but its impact is narrower to dataset pruning in supervised classification.
Paper 1 has higher likely scientific impact due to clear novelty within an active ML area (dynamic dataset pruning with robustness/worst-group accuracy guarantees), strong real-world applicability (training cost reduction with fairness/robustness benefits), and extensive empirical validation across multiple datasets, models, and training paradigms with released code. Paper 2 is conceptually innovative and mathematically rigorous, but appears more niche (specialized gauge-invariant readouts) with less demonstrated breadth of applications and no empirical evidence of downstream performance, making near-term cross-field and practical impact less certain.
Paper 1 likely has higher impact due to stronger novelty and broader relevance: it targets agentic RL for LLM tool-use, a rapidly growing area, and introduces fine-grained branching/credit assignment beyond common heuristic units. The method is evaluated across 13 benchmarks with consistent gains over strong baselines, suggesting robustness and wide applicability to agentic systems. Paper 2 is practically valuable for efficient training and robustness under class imbalance, but dynamic pruning is a more mature niche; its impact is likely narrower to supervised classification pipelines.
Paper 2 is likely to have higher scientific impact due to greater novelty (adapting implicit neural representations to policy/behavior modeling with episode-level latent modulation), broader cross-domain relevance (robotics, games, racing, chess, demonstrations), and timely alignment with self-supervised learning on heterogeneous unlabeled behavior data. It also introduces a useful OOD taxonomy at the policy level. Paper 1 is practically valuable and methodically solid for efficient, robust training via class-aware dynamic pruning, but its conceptual scope is narrower and more incremental within data pruning/efficiency literature.
RCAP addresses a fundamental and broadly applicable problem in machine learning—efficient training via dynamic dataset pruning with robustness to class imbalance. It demonstrates consistent improvements across 6 datasets, 5 models, and 3 training paradigms, showing strong generalizability. The finding that 10% of data can outperform full training on imbalanced datasets is practically significant, offering ~8.7x speedups. In contrast, Claw-SWE-Bench is a narrower benchmark contribution for evaluating coding agents, primarily serving the SWE-bench community. RCAP's broader applicability across ML domains and its methodological contribution give it higher potential impact.
Paper 2 likely has higher scientific impact: it introduces a broadly applicable, novel training-time algorithm (class-aware probabilistic dynamic pruning) with strong claimed gains in efficiency and worst-group accuracy across multiple datasets, models, and paradigms, increasing potential adoption across ML and applied domains. Its methodological contribution generalizes beyond a single institution or disease context and is timely given compute constraints and fairness/robustness concerns. Paper 1 is clinically relevant and interpretable but is single-center, retrospective, and may face generalizability and deployment barriers, limiting breadth of impact.
Paper 2 addresses a fundamental and timely question in deep learning—understanding which subquadratic architectures best replace Transformers—with broad implications for LLMs, time-series modeling, and efficient AI. Its unified theoretical framework comparing xLSTM, Mamba-2, and Gated DeltaNet, combined with principled analysis of memory dynamics and state tracking, provides foundational insights for architecture design. Paper 1 offers a solid incremental contribution to dataset pruning with class-aware sampling, but its scope is narrower (classification pruning) and impact more specialized compared to Paper 2's influence on the widely studied efficient sequence modeling paradigm.