XtrAIn: Training-Guided Occlusion for Feature Attribution

Thodoris Lymperopoulos, Ioannis Kakogeorgiou, Denia Kanellopoulou

Jun 9, 2026arXiv:2606.10877v1

cs.LGcs.CV

#4054of 5669·cs.LG

#4054 of 5669 · cs.LG

Tournament Score

1344±44

10501750

42%

Win Rate

Wins

Losses

Matches

Rating

3.8/ 10

Significance4

Rigor4

Novelty5.5

Clarity6

Abstract

Occlusion-based attribution methods provide an intuitive way to estimate feature importance by perturbing input features and measuring the resulting change in model output. However, their reliability is strongly affected by how feature removal is implemented: externally selected baselines can introduce bias, out-of-distribution samples, and unstable explanations, while in nonlinear models the occlusion of a set of features can also alter the contribution of non-occluded features. We refer to this effect as attribution shift, as the attribution scores of the non-occluded features drift from their initial values. To challenge these major issues that render explanations unstable, we introduce XtrAIn, a training-guided attribution method that transfers the occlusion operation from the input space to the parameter space. Instead of replacing input values with hand-crafted baselines, XtrAIn follows the model's training trajectory and measures how feature-associated parameter updates affect the output logits. We further introduce Xstep, a lightweight approximation for reducing computational cost, and XtrAIn+, a target-focused variant that emphasizes updates aligned with the target class. Experiments on controlled image datasets and PAM50 breast-cancer subtype classification show that the proposed methods produce cleaner and more interpretable attribution patterns than standard attribution baselines. Overall, XtrAIn provides a training-aware perspective on feature attribution and offers a useful diagnostic tool for studying how feature-level evidence is formed during training.

AI Impact Assessments

(1 models)

Scientific Impact Assessment: XtrAIn: Training-Guided Occlusion for Feature Attribution

1. Core Contribution

XtrAIn proposes shifting the occlusion operation from input space to parameter space for feature attribution. Rather than replacing input features with baseline values (the standard approach), the method tracks how feature-associated weight updates across training steps affect model logits. The key insight is that gradient-based weight updates are inherently "independent" (computed with other parameters frozen via chain rule), potentially avoiding the "attribution shift" problem—where occluding one feature in input space changes the effective contribution of non-occluded features in nonlinear models.

The paper introduces three variants: XtrAIn (full trajectory accumulation), Xstep (lightweight approximation using selected checkpoints), and XtrAIn+ (target-class-focused variant filtering for positive target updates). It also proposes CleanScore, a metric for evaluating attribution cleanness when signal/background regions are known.

The conceptual reframing—from "how to explain a trained model" to "how to explain a model update"—is intellectually interesting and represents a genuinely different perspective on attribution.

2. Methodological Rigor

Strengths in formulation: The mathematical framework is clearly presented with explicit assumptions (Assumptions 1-6), and the Inverse Property (Criterion 1) is formally proven. The symmetry of the forward/reverse parameter-space occlusion (Eqs. 4-5) is well-motivated.

Significant weaknesses:

Restricted to FCNNs: The method is defined only for fully connected neural networks, where input features map directly to first-layer weights. This is a severe limitation in an era dominated by CNNs, transformers, and other architectures where input-weight associations are far less straightforward. The authors acknowledge this but don't provide a clear path to extension.

Assumptions are strong and sometimes circular: Assumption 4 (logit changes represent parameter update effects) essentially defines the attribution rule rather than deriving it from first principles. The independence argument (Section 3.3.1) conflates the mathematical structure of gradient computation with true causal independence of feature effects.

Attribution shift analysis is incomplete: The attribution shift concept is illustrated with a single toy example (Fig. 1) rather than formally characterized. The claim that parameter-space occlusion eliminates attribution shift lacks rigorous proof—it's argued by analogy to gradient independence, but this doesn't account for the nonlinear interactions that occur when weights are substituted in later layers.

Experimental evaluation is narrow: Only simple FCNN models on MNIST variants and a 50-gene dataset are tested. Models achieve >90% accuracy on simple tasks, and the AMNIST dataset where the model fails (25% accuracy) is excluded from quantitative evaluation. The CleanScore metric requires known signal/background decomposition, limiting its applicability.

CleanScore circularity concerns: The metric assumes that informative pixels are confined to a known central region, which is dataset-specific rather than general. The metric rewards background silence, which inherently favors XtrAIn's design (parameter-space occlusion naturally avoids perturbing uninformative features that receive small weight updates).

3. Potential Impact

The theoretical contribution—reconceptualizing attribution through training dynamics—opens an interesting research direction. The training-aware perspective could be valuable for:

Model diagnostics: The PAM50 experiment demonstrates detecting when a model has learned only one class (XtrAIn+ assigning zero importance), which has practical value for safety-critical applications.

Understanding learning dynamics: Intermediate attribution patterns (Fig. 3) provide a window into how feature importance evolves during training.

However, the practical impact is severely limited by the FCNN restriction. Modern deep learning predominantly uses architectures where the input-to-weight mapping is not one-to-one (convolutions share weights across spatial locations; attention mechanisms have no fixed feature-weight association). Without a credible extension strategy, the method remains a proof-of-concept for a narrow architecture class.

4. Timeliness & Relevance

The paper addresses real and recognized problems in XAI: baseline sensitivity, OoD artifacts, and evaluation circularity. These are current bottlenecks. The training-dynamics perspective connects to the growing interest in developmental/mechanistic interpretability. However, the field has largely moved beyond simple FCNNs, making the paper's scope feel dated despite addressing timely conceptual questions.

5. Strengths & Limitations

Key Strengths:

Novel conceptual framework transferring occlusion to parameter space

Clean formalization with explicit assumptions

Inverse Property provides theoretical grounding

XtrAIn+ demonstrates practical diagnostic value (PAM50 experiment)

Honest discussion of SRG metric limitations and evaluation circularity

Notable Weaknesses:

Architecture limited to FCNNs—no path to CNNs, transformers, or other modern architectures

Computational cost of full XtrAIn requires storing all training checkpoints and running 2N forward passes per step (N = features), which is impractical at scale

Evaluation datasets are overly simple; no comparison on standard XAI benchmarks

The loss disentanglement (Appendix A.2) is acknowledged as approximate but treated as exact in practice

Missing comparison with data-influence methods (e.g., TracIn, influence functions) that similarly use training trajectory information

No user study or downstream task evaluation to validate "cleaner" explanations translate to better human understanding

Reproducibility: no code availability mentioned

Additional Observations

The paper builds on a very recent preprint by the same authors [37], which limits the novelty somewhat. The relationship between XtrAIn and influence functions or TracIn—which also trace training dynamics to explain predictions—is not discussed, representing a significant gap in the related work.

The qualitative results (Fig. 7) do show visually cleaner attribution maps, but the quantitative advantage is modest and measured only on the authors' own metric.

Rating:3.8/ 10

Significance 4Rigor 4Novelty 5.5Clarity 6

Generated Jun 10, 2026

Comparison History (19)

Wonvs. Multimodal Ordinal Modeling of Alzheimer's Disease Severity Using Structural MRI and Clinical Data

Paper 1 introduces a fundamentally novel approach to feature attribution by shifting occlusion from input space to parameter space, addressing well-known problems (baseline bias, out-of-distribution artifacts, attribution shift) in explainable AI. This methodological innovation has broad applicability across many domains. Paper 2, while well-executed, applies existing techniques (attention mechanisms, ordinal regression, multimodal fusion, Grad-CAM++, SHAP) to AD staging in a relatively incremental manner. Paper 1's conceptual contribution—training-guided attribution—opens new research directions in interpretability, giving it higher potential impact.

claude-opus-4-6·Jun 11, 2026

Lostvs. Capacity-Constrained Online Convex Optimization with Delayed Feedback

Paper 2 introduces a novel and rigorous theoretical framework for capacity-constrained online convex optimization with delayed feedback, addressing a practical gap in online learning theory. It provides the first regret guarantees for this setting, introduces a new semi-clairvoyant model, and establishes clean theoretical results showing graceful degradation. The work has broad applicability across online learning, optimization, and distributed systems. Paper 1, while addressing an important XAI problem, offers a more incremental contribution to the crowded feature attribution space with primarily empirical validation on limited datasets.

claude-opus-4-6·Jun 11, 2026

Lostvs. Redesign Mixture-of-Experts Routers with Manifold Power Iteration

Paper 1 addresses a fundamental architectural component (routers) of Mixture-of-Experts models, which are central to modern large-scale LLMs (e.g., GPT-4, Mixtral). The principled redesign using manifold power iteration to align router rows with principal singular directions of experts is novel, theoretically grounded, and empirically validated at scale (1B-11B parameters). This has broad, immediate applicability to the rapidly growing MoE ecosystem. Paper 2 contributes a useful feature attribution method, but operates in the more incremental XAI/interpretability space with narrower scope and smaller-scale experiments.

claude-opus-4-6·Jun 11, 2026

Wonvs. Simplicity Suffices for Parameter Noise Injection in Stochastic Gradient Descent

Paper 2 is more novel in reframing occlusion attribution from input perturbations to training-guided parameter-space updates, directly addressing key known failure modes (baseline bias, OOD artifacts, attribution shift). The application space (explainable AI) is broad and timely across high-stakes domains, with clear practical relevance (e.g., medical subtype classification). It introduces a core method plus scalable variants (Xstep, XtrAIn+), suggesting wider adoption. Paper 1 is useful and rigorous but mainly consolidates/design-simplifies an existing technique with narrower cross-field impact.

gpt-5.2·Jun 11, 2026

Wonvs. Holding the FP8 Quality Ceiling at 8-Bit Weights and Activations: INT8 and GGUF Post-Training Quantization of Ideogram 4.0 for Consumer GPUs

Paper 1 introduces a conceptually novel, training-guided occlusion mechanism that shifts attribution from input perturbations to parameter-space updates, directly addressing known failure modes (baseline bias, OOD artifacts, attribution shift). This is methodologically and conceptually relevant across many model families and application domains where interpretability is critical (e.g., healthcare), giving broader cross-field impact. Paper 2 is timely and practically valuable for deploying a specific large diffusion model on consumer GPUs, but it is more engineering- and hardware/model-specific, with narrower generalizability and longer-term scientific reach.

gpt-5.2·Jun 11, 2026

Lostvs. Encoding the Euler Characteristic Transform

Paper 1 introduces a principled continuous encoding for the Euler Characteristic Transform with systematic architectural comparisons across diverse data modalities, advancing topological data analysis for machine learning. Its contributions—continuous tokenization, modular pipeline design, and comprehensive benchmarking across point clouds, graphs, meshes, and cubical complexes—have broader methodological impact across multiple fields. Paper 2 addresses an important but narrower problem in explainability (occlusion-based attribution), offering incremental improvements with limited experimental scope (image datasets and one biomedical task). Paper 1's mathematical rigor and cross-domain applicability give it higher potential impact.

claude-opus-4-6·Jun 10, 2026

Wonvs. A Systematic Approach for Selecting Trajectories for Data Augmentation

Paper 2 is more novel and broadly impactful: it reframes occlusion attribution by moving perturbations from input space to parameter/training-trajectory space, directly addressing baseline bias and introducing “attribution shift” as a general issue. Interpretability methods are widely applicable across domains and timely for trustworthy ML, with clear practical use in debugging and high-stakes settings (e.g., cancer classification). It also proposes variants (Xstep, XtrAIn+) suggesting methodological extensibility. Paper 1 is rigorous but more domain-specific (trajectory augmentation) and its benefits are explicitly conditional, likely limiting uptake and cross-field influence.

gpt-5.2·Jun 10, 2026

Lostvs. First-Order Trajectory Matching: Fast Ensemble Predictions of Chaotic, Turbulent, Stochastic Systems

Paper 2 is likely higher impact: it proposes a broadly applicable surrogate modeling framework for stochastic/chaotic/turbulent dynamical systems with direct relevance to physics, climate, fluid dynamics, and uncertainty quantification. The method avoids estimating drift/diffusion/score, offers stability analysis, and targets ensemble statistics and current-like quantities—capabilities valuable across many scientific domains and timely for fast simulation and surrogate modeling. Paper 1 is a novel XAI contribution with practical utility, but its impact is narrower (primarily ML interpretability) and may face adoption barriers due to training-trajectory dependence and compute overhead.

gpt-5.2·Jun 10, 2026

Wonvs. Transformer Based Model for Spatiotemporal Feature Learning in EEG Emotion Recognition

XtrAIn introduces a fundamentally novel perspective on feature attribution by shifting occlusion from input space to parameter space, addressing well-known limitations (baseline bias, OOD samples, attribution shift) in explainability methods. This has broader cross-domain impact spanning interpretable ML, medical AI, and trustworthy AI. Paper 1, while solid, represents an incremental architectural contribution in EEG emotion recognition—a narrower application domain with many competing transformer-based approaches. XtrAIn's methodological innovation and applicability across model types and domains give it higher potential impact.

claude-opus-4-6·Jun 10, 2026

Lostvs. Escaping the KL Agreement Trap in On-Policy Distillation

Paper 2 targets a timely and widely used training paradigm (on-policy distillation for LLMs/RLHF-like settings) and identifies a concrete failure mode (low-KL agreement trap) with a simple, actionable termination rule that improves accuracy while cutting compute. The contribution is broadly applicable across model families and tasks that use teacher scoring of student rollouts, giving it strong real-world relevance and cross-field impact. Paper 1 is novel for interpretability but is more specialized, potentially heavier to deploy, and likely narrower in downstream adoption.

gpt-5.2·Jun 10, 2026

#4054of 5669·cs.LG

#4054 of 5669 · cs.LG

Tournament Score

1344±44

10501750

42%

Win Rate

Wins

Losses

Matches

Rating

3.8/ 10

Significance4

Rigor4

Novelty5.5

Clarity6