RCAP: Robust, Class-Aware, Probabilistic Dynamic Dataset Pruning

Atif Hassan, Swanand Khare, Jiaul H. Paik

Jun 10, 2026arXiv:2606.11761v1

cs.LG

#4324of 5669·cs.LG

#4324 of 5669 · cs.LG

Tournament Score

1330±43

10501750

38%

Win Rate

Wins

Losses

Matches

Rating

6.5/ 10

Significance6.5

Rigor6

Novelty6.5

Clarity7

Abstract

Dynamic data pruning techniques aim to reduce computational cost while minimizing information loss by periodically selecting representative subsets of input data during model training. However, existing methods often struggle to maintain strong worst-group accuracy, particularly at high pruning rates, across balanced and imbalanced datasets. To address this challenge, we propose RCAP, a Robust, Class-Aware, Probabilistic dynamic dataset pruning algorithm for classification tasks. RCAP applies a closed-form solution to estimate the fraction of samples to be included in the training subset for each individual class. This fraction is adaptively adjusted in every epoch using class-wise aggregated loss. Thereafter, it employs an adaptive sampling strategy that prioritizes samples having high loss for populating the class-wise subsets. We evaluate RCAP on six diverse datasets ranging from class-balanced to highly imbalanced using five distinct models across three training paradigms: training from scratch, transfer learning, and fine-tuning. Our approach consistently outperforms state-of-the-art dataset pruning methods, achieving superior worst-group accuracy at all pruning rates. Remarkably, with only $10 %$ data, RCAP delivers $> 1 %$ improvement in performance on class-imbalanced datasets compared to full data training while providing an average $8.69\times$ speedup. The code can be accessed at https://github.com/atif-hassan/RCAP-dynamic-dataset-pruning

AI Impact Assessments

(1 models)

Scientific Impact Assessment: RCAP — Robust, Class-Aware, Probabilistic Dynamic Dataset Pruning

1. Core Contribution

RCAP introduces a dynamic dataset pruning algorithm that explicitly addresses worst-group (class-wise) accuracy, a metric largely overlooked by prior dynamic pruning methods. The paper makes two interlocking contributions: (1) a closed-form solution (Theorem 3.1) for determining per-class subset sizes based on class-wise aggregated loss, derived via Lagrange optimization; and (2) a softmax-temperature-based adaptive sampling strategy that probabilistically selects high-loss samples within each class. The key insight is that classes with higher cumulative loss should receive proportionally more representation in the training subset, which naturally counteracts class imbalance and varying class difficulty.

The problem addressed — maintaining robustness (worst-group accuracy) under aggressive data pruning — is practically relevant but has not been the primary focus of prior dynamic pruning work. Most existing methods (InfoBatch, RS2) optimize for average accuracy, which can mask severe performance degradation on minority or hard classes.

2. Methodological Rigor

Theoretical Foundation: Theorem 3.1 provides a clean, closed-form solution for optimal class-wise retention fractions under the assumption of full-batch gradient descent. The derivation is straightforward Lagrange multiplier optimization of the total empirical error with a budget constraint. However, the assumption of full-batch gradient descent is a significant gap — practical training uses mini-batch SGD, and the theorem's optimality guarantees do not directly transfer. The paper acknowledges using epoch-t statistics as a proxy for epoch t+1 and cites precedent for this history-based approach, but doesn't formally bound the approximation error.

The connection between minimizing the right-hand side of Equation 10 and selecting high-loss samples relies on an empirical observation (monotonic relationship between loss and gradient norm), supported only by a toy experiment on 90 examples (Figure 3 in supplementary). This is a weak link — the monotonic relationship is not proven for general architectures or loss landscapes, and the toy example is far from representative of realistic training scenarios.

Experimental Design: The evaluation is comprehensive: six datasets (balanced to extremely imbalanced), five architectures, three training paradigms, four pruning rates, and seven baselines. Results are averaged over three seeds with standard deviations reported. The breadth of evaluation is a clear strength. The claim that static methods are evaluated in their "best-case scenario" (same architecture for pruning and training) is a fair point that strengthens the comparison.

Potential Concerns: The β hyperparameter requires per-dataset, per-pruning-rate tuning (Table 4 shows different values across all settings), which somewhat undermines the claim of simplicity. The ablation study (Figure 2) reveals that performance is sensitive to β, especially at high pruning rates, which could limit practical usability. The handling of the αj > 1 boundary condition (Equation 5) through redistribution is pragmatic but ad-hoc.

3. Potential Impact

RCAP addresses a genuine need in deploying deep learning models fairly: ensuring that computational savings from data pruning don't disproportionately harm underrepresented classes. This is relevant for applications in medical imaging, autonomous driving, and any domain where class imbalance is inherent and worst-case performance matters.

The reported results are striking — on class-imbalanced datasets, RCAP with only 10% data outperforms full-data training in worst-group accuracy (e.g., iNaturalist: 68.49% vs 69.66% at 90% pruning, CelebA: 91.24% vs 90.14%). The 8.69× speedup is practically significant. The O(1) per-sample overhead claim (since everything is computed from forward-pass losses) makes integration into existing pipelines straightforward.

However, the scope is limited to classification tasks. Extension to other tasks (detection, segmentation, generation) is non-trivial and not discussed. The limitation regarding LLMs is acknowledged but leaves a significant gap given current research priorities.

4. Timeliness & Relevance

The paper addresses a timely intersection of two important trends: (a) reducing training costs for deep learning, and (b) ensuring fairness/robustness across subgroups. The emphasis on worst-group accuracy aligns with growing awareness of algorithmic fairness. Dynamic pruning remains an active research area, and incorporating class awareness is a natural and needed extension.

That said, the paper's focus on image classification with relatively standard architectures (ResNet, EfficientNet) may feel incremental given the field's shift toward foundation models and LLMs. The ImageNet experiment using a frozen DINOv2 backbone with a 2-layer MLP is more of a transfer learning evaluation than a true large-scale training experiment.

5. Strengths & Limitations

Strengths:

Clean mathematical formulation with a closed-form solution for per-class allocation

Comprehensive experimental evaluation across diverse settings

Consistent and substantial improvements in worst-group accuracy, especially at high pruning rates

No computational overhead beyond standard forward pass

Code availability enhances reproducibility

The observation that aggressive pruning can *improve* worst-group accuracy on imbalanced datasets is insightful

Limitations:

The theoretical analysis relies on full-batch gradient descent assumption, limiting formal guarantees

The monotonic loss-gradient relationship is only empirically demonstrated on a trivial toy problem

β requires manual tuning per dataset/pruning-rate (8 different values across 6 datasets × 4 pruning rates)

The clipping threshold (Equation 11d) using epoch-0 maximum loss is somewhat arbitrary

Average-group accuracy on balanced datasets (CIFAR10/100) shows RCAP is sometimes slightly below the best baselines, suggesting a tradeoff

No evaluation on NLP tasks, despite the method being applicable to any classification task

The paper doesn't discuss computational cost of the initial forward pass through randomly initialized network

Limited analysis of when/why the method might fail

Additional Observations

The comparison landscape is fair but could be strengthened. The paper compares against methods published through 2024, which is appropriate. However, some results raise questions — e.g., UCB producing random outputs on 4/6 datasets suggests possible implementation issues rather than fundamental method failure. The paper could also benefit from analysis of the learned α distributions over training to provide intuition about the method's behavior.

Rating:6.5/ 10

Significance 6.5Rigor 6Novelty 6.5Clarity 7

Generated Jun 11, 2026

Comparison History (21)

Wonvs. Distributional Loss for Robust Classification

Paper 1 offers a highly practical solution to reduce computational costs in model training, a critical issue in modern ML. It provides extensive empirical validation across multiple datasets, models, and paradigms, reporting concrete, significant improvements (8.69x speedup, better accuracy with 10% data) and includes open-source code. Paper 2 presents an interesting theoretical concept but lacks the quantitative evidence and breadth of evaluation demonstrated in Paper 1.

gemini-3.1-pro-preview·Jun 12, 2026

Wonvs. Positional Encoding in the Context of Memristor-Based Analog Computation for Automatic Speech Recognition

Paper 2 likely has higher scientific impact due to broader applicability and timeliness: robust dynamic dataset pruning affects training efficiency across many models, datasets, and deployment settings, with clear speedups and accuracy gains (including worst-group accuracy). Its method is generally usable in standard ML pipelines and supported by extensive evaluation and released code. Paper 1 is novel but more niche—focused on memristor-based analog ASR and ADC/positional encoding interactions—impactful for specialized hardware-ML co-design but narrower in scope and immediate adoption.

gpt-5.2·Jun 12, 2026

Lostvs. Uncertainty Estimation for Molecular Diffusion Models

Paper 2 addresses a critical gap in molecular diffusion models—uncertainty quantification—which has broad implications for drug discovery, materials science, and generative modeling. The method is principled (Laplace approximation), post-hoc (applicable to pretrained models), and enables practical test-time filtering. This combines timeliness (diffusion models are rapidly growing) with high real-world impact in computational chemistry. Paper 1, while solid and practical, represents an incremental improvement in dataset pruning with class-awareness, a more mature and narrower subfield with less transformative potential.

claude-opus-4-6·Jun 12, 2026

Lostvs. Enhanced Low-Density Region Exploration in Classifier-Guided Diffusion Models Through Modified Reverse Diffusion Sampling

Paper 2 likely has higher scientific impact: it proposes a novel, training-free modification to classifier-guided diffusion sampling to explore low-density (tail) regions, directly addressing a timely and widely relevant limitation of diffusion models. The approach could be broadly adopted across generative modeling tasks and domains (images, audio, etc.) with minimal overhead, and aligns with strong current interest in controllable sampling, coverage/recall, and guidance methods. Paper 1 is practical and solid for efficient training and robustness, but its impact is narrower to dataset pruning in supervised classification.

gpt-5.2·Jun 12, 2026

Wonvs. Adjusted Cup-Product Neural Layer

Paper 1 has higher likely scientific impact due to clear novelty within an active ML area (dynamic dataset pruning with robustness/worst-group accuracy guarantees), strong real-world applicability (training cost reduction with fairness/robustness benefits), and extensive empirical validation across multiple datasets, models, and training paradigms with released code. Paper 2 is conceptually innovative and mathematically rigorous, but appears more niche (specialized gauge-invariant readouts) with less demonstrated breadth of applications and no empirical evidence of downstream performance, making near-term cross-field and practical impact less certain.

gpt-5.2·Jun 12, 2026

Lostvs. APPO: Agentic Procedural Policy Optimization

Paper 1 likely has higher impact due to stronger novelty and broader relevance: it targets agentic RL for LLM tool-use, a rapidly growing area, and introduces fine-grained branching/credit assignment beyond common heuristic units. The method is evaluated across 13 benchmarks with consistent gains over strong baselines, suggesting robustness and wide applicability to agentic systems. Paper 2 is practically valuable for efficient training and robustness under class imbalance, but dynamic pruning is a more mature niche; its impact is likely narrower to supervised classification pipelines.

gpt-5.2·Jun 11, 2026

Lostvs. Implicit Neural Representations of Individual Behavior

Paper 2 is likely to have higher scientific impact due to greater novelty (adapting implicit neural representations to policy/behavior modeling with episode-level latent modulation), broader cross-domain relevance (robotics, games, racing, chess, demonstrations), and timely alignment with self-supervised learning on heterogeneous unlabeled behavior data. It also introduces a useful OOD taxonomy at the policy level. Paper 1 is practically valuable and methodically solid for efficient, robust training via class-aware dynamic pruning, but its conceptual scope is narrower and more incremental within data pruning/efficiency literature.

gpt-5.2·Jun 11, 2026

Wonvs. Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks

RCAP addresses a fundamental and broadly applicable problem in machine learning—efficient training via dynamic dataset pruning with robustness to class imbalance. It demonstrates consistent improvements across 6 datasets, 5 models, and 3 training paradigms, showing strong generalizability. The finding that 10% of data can outperform full training on imbalanced datasets is practically significant, offering ~8.7x speedups. In contrast, Claw-SWE-Bench is a narrower benchmark contribution for evaluating coding agents, primarily serving the SWE-bench community. RCAP's broader applicability across ML domains and its methodological contribution give it higher potential impact.

claude-opus-4-6·Jun 11, 2026

Wonvs. Pre-AF 13: An Interpretable Atrial Fibrillation Risk Score Mined from Discharge Reports

Paper 2 likely has higher scientific impact: it introduces a broadly applicable, novel training-time algorithm (class-aware probabilistic dynamic pruning) with strong claimed gains in efficiency and worst-group accuracy across multiple datasets, models, and paradigms, increasing potential adoption across ML and applied domains. Its methodological contribution generalizes beyond a single institution or disease context and is timely given compute constraints and fairness/robustness concerns. Paper 1 is clinically relevant and interpretable but is single-center, retrospective, and may face generalizability and deployment barriers, limiting breadth of impact.

gpt-5.2·Jun 11, 2026

Lostvs. On Subquadratic Architectures: From Applications to Principles

Paper 2 addresses a fundamental and timely question in deep learning—understanding which subquadratic architectures best replace Transformers—with broad implications for LLMs, time-series modeling, and efficient AI. Its unified theoretical framework comparing xLSTM, Mamba-2, and Gated DeltaNet, combined with principled analysis of memory dynamics and state tracking, provides foundational insights for architecture design. Paper 1 offers a solid incremental contribution to dataset pruning with class-aware sampling, but its scope is narrower (classification pruning) and impact more specialized compared to Paper 2's influence on the widely studied efficient sequence modeling paradigm.

claude-opus-4-6·Jun 11, 2026

#4324of 5669·cs.LG

#4324 of 5669 · cs.LG

Tournament Score

1330±43

10501750

38%

Win Rate

Wins

Losses

Matches

Rating

6.5/ 10

Significance6.5

Rigor6

Novelty6.5

Clarity7