Veerendhra Kumar Dangeti, Xiao Gu, Ying Weng, Shreyank N Gowda
Training deep neural networks for clinical time-series analysis is computationally demanding, yet many healthcare settings lack the resources required for repeated model development and deployment. This challenge is particularly evident in electrocardiogram classification, where large datasets and long training schedules make efficiency practically important. Progressive Data Dropout reduces training cost by excluding samples from gradient updates once they are learned, but it relies on model confidence and may retain samples that are difficult due to noise or ambiguity rather than useful signal. In this work, we introduce ERTS, an explainability-based reliability training signal for efficient ECG classification. ERTS uses explanation quality during training to distinguish between informative and unreliable uncertainty. Building on progressive data selection, we compute Grad-CAM attention maps for candidate samples and derive a focus score that measures whether model predictions are supported by coherent and localised patterns. Samples with low focus are filtered out, while those with meaningful attention are prioritised for gradient updates. We evaluate ERTS across three ECG datasets and multiple backbone architectures, showing consistent improvements in macro-F1 alongside reduced effective training cost. These results suggest that explanation quality can serve as a practical signal for improving both efficiency and reliability in clinical time-series learning. Code will be released.
ERTS proposes augmenting Progressive Data Dropout (PDD) with a Grad-CAM-based "focus score" that serves as a training-time reliability signal. The core idea is that among uncertain samples (those not yet confidently classified), some are uncertain because they contain meaningful but under-learned patterns, while others are uncertain due to noise, label ambiguity, or artifacts. ERTS distinguishes between these two cases by computing Grad-CAM attention maps and measuring their spatial concentration (via the mean intensity of the top-10% salient regions). Samples with diffuse attention are filtered out; those with focused attention are retained for gradient updates.
The conceptual contribution—using explanation quality as a training-time data selection criterion rather than purely as a post-hoc interpretability tool—is genuinely interesting and represents a meaningful shift in how explainability methods can be employed. However, the technical novelty is relatively modest: the method is a two-stage filter combining an existing PDD framework with a simple Grad-CAM concentration metric.
Experimental breadth is a strength. The paper evaluates across three ECG datasets (PTB-XL, CPSC 2018, Georgia 2020), three backbone architectures (EfficientNetV2-S, ResNet-18, MobileNetV2), and multiple PDD variants (DBPD, SMRD, SRD) with various threshold settings. This combinatorial evaluation (9 dataset-backbone pairs, dozens of configurations) lends credibility to the generalization claims.
The paper addresses a real need: training efficiency in clinical ML settings where computational resources are constrained. The idea of using explanation quality during training (rather than only post-hoc) has broader applicability beyond ECG classification—potentially extending to medical imaging, EHR analysis, and other clinical time-series domains.
However, the practical impact may be limited by:
The paper is timely in two respects: (1) growing interest in green/efficient AI and (2) increasing emphasis on trustworthy clinical AI. The intersection of explainability and training efficiency is underexplored, making the conceptual framing relevant. ECG classification is a well-motivated application domain given the scale of cardiac data and the resource constraints in many healthcare settings.
However, the paper does not engage with recent curriculum learning advances, data influence methods, or active learning literature that also addresses the question of which samples to prioritize during training.
The paper reads more as an empirical study than a methodological contribution. The extensive tables (taking ~8 pages) report many configurations but the core insight could be conveyed more concisely. The writing is clear but repetitive. The related work adequately contextualizes the contribution but undersells the connection to active learning and data valuation literatures.
The class-level analysis (Section 4.8) is the most compelling part of the paper, showing that ERTS preferentially removes NORM samples with diffuse attention while preserving pathological classes—this provides mechanistic insight into why the method works.
Generated Jun 11, 2026
Paper 1 has higher potential impact because it introduces a training-time mechanism that leverages explainability (Grad-CAM focus) as a reliability/efficiency signal, addressing a concrete bottleneck in clinical ECG deployment (compute constraints, noisy/ambiguous samples). It is timely (trustworthy/efficient medical AI), has clear real-world applicability, and could generalize to other clinical time-series tasks. Paper 2’s distributional loss is broadly applicable, but the abstract is less specific about rigor/validation and similar “soft target/label smoothing/ambiguity-aware” losses exist, reducing perceived novelty.
Paper 2 demonstrates higher potential scientific impact due to its broad applicability and conceptual novelty. While Paper 1 introduces a valuable, domain-specific architectural improvement for neural operators solving PDEs, Paper 2 innovatively bridges explainable AI (XAI) and efficient training by using explanation quality as an active data-pruning signal. This approach addresses critical real-world challenges in clinical machine learning—computational constraints and noisy datasets—offering immediate translational value for healthcare applications while contributing a novel methodology to the broader fields of trustworthy and efficient deep learning.
Paper 2 presents a highly innovative approach by utilizing explainability (XAI) not just for post-hoc analysis, but as an active training signal to improve data efficiency and model reliability. Its application to ECG classification addresses a critical real-world bottleneck in healthcare AI—computational constraints and noisy clinical data. While Paper 1 offers solid theoretical improvements to multimodal VAEs, Paper 2 demonstrates broader translational impact, directly benefiting clinical ML deployment and advancing the integration of explainability into the model optimization process.
Paper 2 addresses a fundamental and broadly applicable problem in ensemble learning with a mathematically rigorous solution. The identification and resolution of the 'L1-simplex paradox' is a genuine theoretical contribution. SCSB is model-agnostic, applicable across Random Forests, Bagged SVMs, and Neural Networks, giving it broad impact across machine learning. The 96% compression with maintained accuracy has significant practical implications. Paper 1, while useful, is more incremental—combining existing techniques (Grad-CAM, progressive data dropout) in a specific domain (ECG classification) with relatively narrow applicability.
Paper 2 addresses a fundamental and broadly applicable challenge—continual anomaly detection across heterogeneous tabular data with varying schemas. Its comprehensive approach, combining alignment, augmentation, and distillation, evaluated across 21 diverse datasets, offers significantly broader cross-disciplinary impact than Paper 1, which focuses on a domain-specific efficiency improvement for ECG classification.
Paper 2 addresses a novel and underexplored security vulnerability in GNN calibration, combining adversarial robustness with calibration—two critical topics in trustworthy AI. It provides theoretical insights linking generalization and calibration vulnerability, introduces a comprehensive framework (UGCA) with multiple technical innovations, and has broader implications for safety-critical AI deployment. Paper 1, while practically useful for ECG efficiency, is more incremental—combining existing techniques (Grad-CAM, progressive data dropout) in a narrower clinical domain. Paper 2's findings about fundamental model vulnerabilities have wider cross-domain relevance.
Paper 1 proposes a highly innovative framework for automated science that automates hypothesis generation and experimental design. Its potential to accelerate mechanistic modeling across various scientific domains gives it a much larger breadth of impact and transformative potential compared to Paper 2, which focuses on a narrower methodological improvement for training efficiency in ECG classification.
Paper 1 likely has higher scientific impact due to greater methodological novelty (Riemannian manifold formulation with Fisher–Rao metric, projectors/retractions/HVPs, rank-sufficiency certificate) and broader applicability across OT variants (linear OT, GW, fused GW, balanced/unbalanced). Its contributions are foundational and can influence multiple fields (optimization, geometry, ML, graphics, computational biology). Paper 2 is timely and practically relevant for clinical ECG efficiency, but its core idea (using Grad-CAM-based filtering as a training signal) is more incremental and domain-specific, with narrower cross-field impact.
Paper 2 likely has higher scientific impact due to a concrete, novel training-time mechanism (explainability-derived reliability signal) with demonstrated empirical gains across multiple ECG datasets/architectures and clear real-world relevance in resource-constrained clinical ML. It combines efficiency, reliability, and interpretability—timely needs in healthcare AI—and offers deployable methodology plus code release. Paper 1 is valuable as a conceptual framework for uncertainty in dynamical systems, but appears more survey/position-oriented with less immediate methodological or application impact unless it introduces new formalism or validated tools.
Paper 1 addresses a fundamental bottleneck in modern AI—the quadratic scaling of Transformers—by evaluating and theoretically unifying subquadratic alternatives like xLSTM and Mamba-2. Its findings on state tracking and memory dynamics have broad implications across multiple domains, including NLP, code generation, and time-series analysis. Paper 2 offers a valuable but more niche application of explainability for efficient ECG training. Because Paper 1 tackles a core architectural challenge with field-wide relevance and high timeliness, it possesses significantly higher potential scientific impact.