Thodoris Lymperopoulos, Ioannis Kakogeorgiou, Denia Kanellopoulou
Occlusion-based attribution methods provide an intuitive way to estimate feature importance by perturbing input features and measuring the resulting change in model output. However, their reliability is strongly affected by how feature removal is implemented: externally selected baselines can introduce bias, out-of-distribution samples, and unstable explanations, while in nonlinear models the occlusion of a set of features can also alter the contribution of non-occluded features. We refer to this effect as attribution shift, as the attribution scores of the non-occluded features drift from their initial values. To challenge these major issues that render explanations unstable, we introduce XtrAIn, a training-guided attribution method that transfers the occlusion operation from the input space to the parameter space. Instead of replacing input values with hand-crafted baselines, XtrAIn follows the model's training trajectory and measures how feature-associated parameter updates affect the output logits. We further introduce Xstep, a lightweight approximation for reducing computational cost, and XtrAIn+, a target-focused variant that emphasizes updates aligned with the target class. Experiments on controlled image datasets and PAM50 breast-cancer subtype classification show that the proposed methods produce cleaner and more interpretable attribution patterns than standard attribution baselines. Overall, XtrAIn provides a training-aware perspective on feature attribution and offers a useful diagnostic tool for studying how feature-level evidence is formed during training.
XtrAIn proposes shifting the occlusion operation from input space to parameter space for feature attribution. Rather than replacing input features with baseline values (the standard approach), the method tracks how feature-associated weight updates across training steps affect model logits. The key insight is that gradient-based weight updates are inherently "independent" (computed with other parameters frozen via chain rule), potentially avoiding the "attribution shift" problem—where occluding one feature in input space changes the effective contribution of non-occluded features in nonlinear models.
The paper introduces three variants: XtrAIn (full trajectory accumulation), Xstep (lightweight approximation using selected checkpoints), and XtrAIn+ (target-class-focused variant filtering for positive target updates). It also proposes CleanScore, a metric for evaluating attribution cleanness when signal/background regions are known.
The conceptual reframing—from "how to explain a trained model" to "how to explain a model update"—is intellectually interesting and represents a genuinely different perspective on attribution.
Strengths in formulation: The mathematical framework is clearly presented with explicit assumptions (Assumptions 1-6), and the Inverse Property (Criterion 1) is formally proven. The symmetry of the forward/reverse parameter-space occlusion (Eqs. 4-5) is well-motivated.
The theoretical contribution—reconceptualizing attribution through training dynamics—opens an interesting research direction. The training-aware perspective could be valuable for:
However, the practical impact is severely limited by the FCNN restriction. Modern deep learning predominantly uses architectures where the input-to-weight mapping is not one-to-one (convolutions share weights across spatial locations; attention mechanisms have no fixed feature-weight association). Without a credible extension strategy, the method remains a proof-of-concept for a narrow architecture class.
The paper addresses real and recognized problems in XAI: baseline sensitivity, OoD artifacts, and evaluation circularity. These are current bottlenecks. The training-dynamics perspective connects to the growing interest in developmental/mechanistic interpretability. However, the field has largely moved beyond simple FCNNs, making the paper's scope feel dated despite addressing timely conceptual questions.
The paper builds on a very recent preprint by the same authors [37], which limits the novelty somewhat. The relationship between XtrAIn and influence functions or TracIn—which also trace training dynamics to explain predictions—is not discussed, representing a significant gap in the related work.
The qualitative results (Fig. 7) do show visually cleaner attribution maps, but the quantitative advantage is modest and measured only on the authors' own metric.
Generated Jun 10, 2026
Paper 1 introduces a fundamentally novel approach to feature attribution by shifting occlusion from input space to parameter space, addressing well-known problems (baseline bias, out-of-distribution artifacts, attribution shift) in explainable AI. This methodological innovation has broad applicability across many domains. Paper 2, while well-executed, applies existing techniques (attention mechanisms, ordinal regression, multimodal fusion, Grad-CAM++, SHAP) to AD staging in a relatively incremental manner. Paper 1's conceptual contribution—training-guided attribution—opens new research directions in interpretability, giving it higher potential impact.
Paper 2 introduces a novel and rigorous theoretical framework for capacity-constrained online convex optimization with delayed feedback, addressing a practical gap in online learning theory. It provides the first regret guarantees for this setting, introduces a new semi-clairvoyant model, and establishes clean theoretical results showing graceful degradation. The work has broad applicability across online learning, optimization, and distributed systems. Paper 1, while addressing an important XAI problem, offers a more incremental contribution to the crowded feature attribution space with primarily empirical validation on limited datasets.
Paper 1 addresses a fundamental architectural component (routers) of Mixture-of-Experts models, which are central to modern large-scale LLMs (e.g., GPT-4, Mixtral). The principled redesign using manifold power iteration to align router rows with principal singular directions of experts is novel, theoretically grounded, and empirically validated at scale (1B-11B parameters). This has broad, immediate applicability to the rapidly growing MoE ecosystem. Paper 2 contributes a useful feature attribution method, but operates in the more incremental XAI/interpretability space with narrower scope and smaller-scale experiments.
Paper 2 is more novel in reframing occlusion attribution from input perturbations to training-guided parameter-space updates, directly addressing key known failure modes (baseline bias, OOD artifacts, attribution shift). The application space (explainable AI) is broad and timely across high-stakes domains, with clear practical relevance (e.g., medical subtype classification). It introduces a core method plus scalable variants (Xstep, XtrAIn+), suggesting wider adoption. Paper 1 is useful and rigorous but mainly consolidates/design-simplifies an existing technique with narrower cross-field impact.
Paper 1 introduces a conceptually novel, training-guided occlusion mechanism that shifts attribution from input perturbations to parameter-space updates, directly addressing known failure modes (baseline bias, OOD artifacts, attribution shift). This is methodologically and conceptually relevant across many model families and application domains where interpretability is critical (e.g., healthcare), giving broader cross-field impact. Paper 2 is timely and practically valuable for deploying a specific large diffusion model on consumer GPUs, but it is more engineering- and hardware/model-specific, with narrower generalizability and longer-term scientific reach.
Paper 1 introduces a principled continuous encoding for the Euler Characteristic Transform with systematic architectural comparisons across diverse data modalities, advancing topological data analysis for machine learning. Its contributions—continuous tokenization, modular pipeline design, and comprehensive benchmarking across point clouds, graphs, meshes, and cubical complexes—have broader methodological impact across multiple fields. Paper 2 addresses an important but narrower problem in explainability (occlusion-based attribution), offering incremental improvements with limited experimental scope (image datasets and one biomedical task). Paper 1's mathematical rigor and cross-domain applicability give it higher potential impact.
Paper 2 is more novel and broadly impactful: it reframes occlusion attribution by moving perturbations from input space to parameter/training-trajectory space, directly addressing baseline bias and introducing “attribution shift” as a general issue. Interpretability methods are widely applicable across domains and timely for trustworthy ML, with clear practical use in debugging and high-stakes settings (e.g., cancer classification). It also proposes variants (Xstep, XtrAIn+) suggesting methodological extensibility. Paper 1 is rigorous but more domain-specific (trajectory augmentation) and its benefits are explicitly conditional, likely limiting uptake and cross-field influence.
Paper 2 is likely higher impact: it proposes a broadly applicable surrogate modeling framework for stochastic/chaotic/turbulent dynamical systems with direct relevance to physics, climate, fluid dynamics, and uncertainty quantification. The method avoids estimating drift/diffusion/score, offers stability analysis, and targets ensemble statistics and current-like quantities—capabilities valuable across many scientific domains and timely for fast simulation and surrogate modeling. Paper 1 is a novel XAI contribution with practical utility, but its impact is narrower (primarily ML interpretability) and may face adoption barriers due to training-trajectory dependence and compute overhead.
XtrAIn introduces a fundamentally novel perspective on feature attribution by shifting occlusion from input space to parameter space, addressing well-known limitations (baseline bias, OOD samples, attribution shift) in explainability methods. This has broader cross-domain impact spanning interpretable ML, medical AI, and trustworthy AI. Paper 1, while solid, represents an incremental architectural contribution in EEG emotion recognition—a narrower application domain with many competing transformer-based approaches. XtrAIn's methodological innovation and applicability across model types and domains give it higher potential impact.
Paper 2 targets a timely and widely used training paradigm (on-policy distillation for LLMs/RLHF-like settings) and identifies a concrete failure mode (low-KL agreement trap) with a simple, actionable termination rule that improves accuracy while cutting compute. The contribution is broadly applicable across model families and tasks that use teacher scoring of student rollouts, giving it strong real-world relevance and cross-field impact. Paper 1 is novel for interpretability but is more specialized, potentially heavier to deploy, and likely narrower in downstream adoption.