Back to Rankings

When Are Neural Interaction Discoveries Real? Identifiability, Recoverability, and a Pre-Fit Diagnostic

Valentina Kuskova, Dmitry Zaytsev, Michael Coppedge

cs.LGstat.ML
Share
#1775 of 5669 · cs.LG
Tournament Score
1442±44
10501750
53%
Win Rate
8
Wins
7
Losses
15
Matches
Rating
6.5/ 10
Significance7
Rigor7
Novelty7
Clarity7.5

Abstract

When a neural time-series model reports that one variable modulates another's effect on a target, is the discovered interaction a property of the data or an artifact of model flexibility? We argue that this is fundamentally a question of identifiability, governed by the geometry of the observed input support rather than by the specific neural architecture. We study the problem in a multiplicative-gating extension of neural additive vector autoregression (GNAVAR), in which source contributions are modulated by other lagged variables. We show that representational capacity is not identifiability: dependent inputs induce leakage between edge-specific interaction terms, and low-dimensional support permits distinct interaction decompositions that agree on the observed data while differing elsewhere. We then prove a population identifiability theorem for normalized minimal GNAVAR decompositions under explicit support conditions, including settings with shared modulators. The theory yields a simple practitioner-facing diagnostic: the effective rank of the joint lag-block covariance predicts, before fitting, whether interaction recovery is feasible for a given candidate set. When the candidate set is unknown, a two-seed stability check provides a practical operational test. The same support condition organizes empirical outcomes into the three states predicted by the theory. Our results show that interaction recoverability depends on support geometry, that effective rank provides a practical pre-fit diagnostic, and that instability across independent fits is a characteristic signature of non-identifiable interaction discovery. The identifiability phenomenon, the support condition, and the instability signature are model-agnostic; GNAVAR is the vehicle that makes them provable.

AI Impact Assessments

(1 models)

Scientific Impact Assessment

Core Contribution

This paper addresses a critical but underexplored question in neural time-series analysis: when a model discovers that one variable modulates another's effect on a target, is this interaction identifiable from the data or merely an artifact of model flexibility? The authors formalize this through a multiplicative-gating extension of Neural Additive Vector Autoregression (G-NAVAR), proving that interaction recoverability depends fundamentally on the geometry of the observed input support rather than on model architecture.

The key intellectual contribution is the separation of representational capacity from identifiability — showing that a model can be expressive enough to fit interaction structure while admitting multiple observationally equivalent decompositions. The paper identifies two distinct obstruction mechanisms: (1) dependence-induced leakage between edge-specific interaction terms, and (2) low-dimensional support permitting distinct decompositions that agree on observed data. These are formalized through impossibility results and a population identifiability theorem.

The practical output is a two-stage diagnostic: a pre-fit effective rank measure of the joint lag-block covariance (cheap, closed-form) and a post-fit two-seed stability check. Together, these distinguish three empirical regimes — recoverable interactions, unstable recovery despite adequate support, and non-identifiability under support collapse.

Methodological Rigor

The theoretical development is careful and well-structured. The paper systematically addresses gauge symmetries (scale, permutation, trivial-gate) before defining identifiability, which is the correct order of operations. The progression from Lemma 23 (separated multiplicative factorization) through Proposition 24 (single-edge identifiability) to Theorem 25 (full population identifiability) is logically tight.

The treatment of shared modulators via hierarchically orthogonal functional decomposition (HOFD) is a notable technical contribution that extends beyond the simpler disjoint-support case. The authors are commendably transparent about what is proven versus assumed — Assumption 20 (gate-interaction faithfulness) is stated as a hypothesis rather than derived, and Remark 22 explicitly catalogs the remaining gaps.

The synthetic experiments are well-designed, with the ρ-sweep cleanly demonstrating the continuous degradation of recovery as effective rank decreases. The capacity-matched comparisons (against MLP, GA2M, and additive NAVAR) are methodologically sound — they control for parameter count and demonstrate that G-NAVAR pays no accuracy penalty for interpretability.

However, there are some limitations in rigor. The real-data experiments, while illustrative, involve only three domains with limited ablation. The effective rank threshold separating "adequate" from "inadequate" support is not formally derived — it's observed empirically. The two-seed stability check, while practical, lacks formal guarantees about its power (how often does it detect non-identifiability when present?).

Potential Impact

The paper's most impactful contribution may be the conceptual framework rather than the specific model. The insight that interaction identifiability is governed by support geometry is genuinely model-agnostic, as the authors claim. This means the effective-rank diagnostic and stability-check workflow could be adopted by practitioners using attention-based models, graph neural networks for time series, or any method that claims to discover variable interactions.

The practical workflow (Figure 1) is immediately actionable: compute effective rank before fitting, run two seeds after fitting, interpret only if both pass. This could become standard practice in applied neural Granger causality, particularly in domains like climate science, economics, and finance where interaction claims carry policy implications.

The connection to functional ANOVA, tensor decomposition identifiability, and nonlinear ICA creates useful bridges between communities. The impossibility results (Proposition 7, Corollary 26) should give pause to practitioners who currently over-interpret learned interaction structures.

Timeliness & Relevance

This work addresses a genuine and growing need. Neural time-series models are increasingly used to make mechanistic claims (e.g., "temperature modulates the ozone response to NO₂"), but the community has largely ignored whether such claims are identifiable. As interpretable ML and causal discovery from observational data gain traction in high-stakes domains, formal identifiability analysis becomes essential. The paper's timing is appropriate: the field has developed sufficient modeling capacity that the identifiability question is now the binding constraint.

Strengths

1. Conceptual clarity: The distinction between representational capacity and identifiability is precisely articulated and widely applicable.

2. Complete theoretical arc: From gauge symmetries through impossibility results to positive identifiability theorems, the theory is self-contained.

3. Practical diagnostics: The effective-rank pre-fit screen and two-seed stability check are simple, interpretable, and immediately usable.

4. Honest empirical taxonomy: The three real-data domains cleanly illustrate the three predicted regimes, and the WDI failure is presented as informatively as the Beijing success.

5. Appropriate epistemic humility: The paper explicitly notes that stability is supporting evidence, not proof of correctness, and catalogs failure modes (F1-F4) with precision.

Limitations

1. Narrow model class for formal results: While the insights are claimed to be model-agnostic, the theorems are specific to G-NAVAR with multiplicative gating. Extension to attention mechanisms or GNNs is left entirely to future work.

2. No formal finite-sample theory: All identifiability results are population-level. The gap between population identifiability and finite-sample recoverability is acknowledged but not formally characterized.

3. Limited real-data validation: Three domains with one target each is thin evidence for the claimed taxonomy. The Beijing result involves a well-known photochemical relationship, limiting novelty of the empirical finding.

4. Effective rank threshold: The diagnostic's decision boundary is not theoretically derived; practitioners must still exercise judgment about what constitutes "adequate" rank.

5. Scalability concerns: The paper does not discuss how the diagnostic and method scale to high-dimensional systems (dozens or hundreds of variables), where the combinatorial explosion of candidate modulator sets becomes problematic.

6. The two-seed check is weak: Using only two seeds provides limited statistical power for detecting instability. A more systematic approach (e.g., bootstrap-based) would strengthen the operational test.

Additional Observations

The paper's framing as targeting ICDM suggests a data mining audience, but the theoretical content is closer to a statistics or machine learning theory venue. The computational cost (22 minutes total) is admirably low, enhancing reproducibility. The discussion of interaction order identifiability (Section X) opens an interesting direction but remains speculative.

Rating:6.5/ 10
Significance 7Rigor 7Novelty 7Clarity 7.5

Generated Jun 9, 2026

Comparison History (15)

Lostvs. Lost in the Non-convex Loss Landscape: How to Fine-tune the Large Time Series Model?

Paper 1 addresses a highly practical and timely problem—fine-tuning large time series models—with a simple, broadly applicable method (SFF) validated across eight major LTSMs. Given the rapid growth of foundation models for time series, this work has immediate wide applicability and addresses a critical barrier (overfitting during fine-tuning). Paper 2 makes important theoretical contributions on identifiability of neural interaction discoveries, but its scope is narrower (specific model class, interaction recovery), limiting its breadth of impact despite strong methodological rigor.

claude-opus-4-6·Jun 9, 2026
Wonvs. Physics-Guided Dual Decoding and Spectral Supervision for Global 3D Hydrometeor Prediction

Paper 2 addresses a fundamental methodological question about identifiability of neural interaction discoveries that applies broadly across any field using neural time-series models. It provides theoretical guarantees (identifiability theorems), practical pre-fit diagnostics, and model-agnostic insights that could reshape how researchers validate discovered interactions. Paper 1, while technically strong in atmospheric science, is more domain-specific and incremental (combining known techniques like wavelet decoupling, adversarial training, and dual decoding). Paper 2's contributions to understanding when neural network discoveries are trustworthy have broader cross-disciplinary impact.

claude-opus-4-6·Jun 9, 2026
Wonvs. STAR-KV: Low-Rank KV Cache Compression via Soft Thresholding for Adaptive Rank Control

Paper 2 addresses a fundamental theoretical question about identifiability of neural interaction discoveries that applies broadly across scientific domains using neural time-series models. It provides rigorous theoretical results (identifiability theorems, support conditions) and practical diagnostics that can prevent spurious scientific conclusions. Its model-agnostic insights about when discovered interactions are real versus artifacts have broad interdisciplinary impact across neuroscience, economics, climate science, and beyond. Paper 1, while technically strong and practically useful for LLM inference efficiency, represents an incremental engineering advance in KV cache compression within a narrow, rapidly evolving subfield.

claude-opus-4-6·Jun 9, 2026
Lostvs. Autoregressive Diffusion World Models for Off-Policy Evaluation of LLM Agents

Paper 1 addresses a critical and highly timely bottleneck in AI research: the safe, offline evaluation of LLM agents. By innovating at the intersection of diffusion world models and discrete autoregressive actions, it offers a highly practical framework with immediate real-world utility. While Paper 2 provides rigorous theoretical contributions to time-series interpretability, Paper 1's alignment with the rapidly expanding field of agentic AI gives it greater potential for broad, immediate scientific impact and widespread adoption.

gemini-3.1-pro-preview·Jun 9, 2026
Wonvs. Less is MoE: Trimming Experts in Domain-Specialist Language Models

Paper 2 likely has higher scientific impact: it addresses a fundamental, broadly recurring problem—whether learned neural interactions are identifiable—providing theorems, explicit support conditions, and practical pre-fit/stability diagnostics that are model-agnostic and applicable across many time-series domains (neuroscience, econometrics, climate, biology). This combination of conceptual clarity, methodological rigor, and actionable diagnostics can change how interaction claims are evaluated. Paper 1 is novel and useful for MoE deployment, but its impact is more specialized to current MoE architectures and compression practice.

gpt-5.2·Jun 9, 2026
Lostvs. Generative Modeling of Discrete Latent Structures via Dynamic Policy Gradients

Paper 2 demonstrates immediate and significant real-world utility by successfully applying its novel machine learning framework to a high-impact bioinformatics problem (RNA sequencing), outperforming standard domain-specific algorithms. While Paper 1 provides crucial theoretical insights into model interpretability, Paper 2's direct translation of a methodological advancement to solve a combinatorial biological problem suggests a broader and faster practical scientific impact.

gemini-3.1-pro-preview·Jun 9, 2026
Lostvs. LLM Explainability with Counterfactual Chains and Causal Graphs

While Paper 1 offers strong theoretical rigor and important identifiability proofs for time-series models, Paper 2 tackles the highly pressing and widely applicable issue of LLM explainability. By introducing a novel causal graph approach for modeling LLM inference, Paper 2 provides a tangible method for interpreting complex models across domains like healthcare and automated evaluation. The explosive adoption of LLMs makes Paper 2's focus more timely, giving it a higher potential for broad scientific and practical impact across multiple fields.

gemini-3.1-pro-preview·Jun 9, 2026
Lostvs. The Evaluation Blind Spot: A Stereological Theory of Benchmark Coverage for Large Language Models

Paper 2 likely has higher scientific impact due to timeliness and broad applicability: it targets a central current problem (LLM evaluation reliability) with theory that yields actionable benchmark-selection methods and explains observed leaderboard instability. Its contributions span ML evaluation, statistics/geometry (stereological bounds), and even resolve a longstanding convex-geometry problem (Gardner 1.5), increasing cross-field reach. Paper 1 is rigorous and valuable for neural time-series identifiability, but its immediate audience and applications are narrower than the LLM-evaluation framework and its practical implications.

gpt-5.2·Jun 9, 2026
Lostvs. Causal Modeling of Selection in Evolution

Paper 1 addresses a fundamental gap in causal modeling by distinguishing static from evolutionary selection—a distinction relevant across biology, social science, and AI. It introduces a new graphical model with sound and complete identification procedures, potentially reshaping how causal discovery handles evolutionary processes. Paper 2 makes solid contributions to neural network identifiability for interaction discovery, but its scope is narrower, focusing on a specific model class (GNAVAR) and a diagnostic tool. Paper 1's broader cross-disciplinary applicability and foundational theoretical contribution give it higher impact potential.

claude-opus-4-6·Jun 9, 2026
Lostvs. Beyond Linear Activation Steering: Invertible Latent Transformations for Controlling LLM Behavior

Paper 1 (INNSteer) addresses a highly timely and practically important problem—controlling LLM behavior at inference time—with a novel nonlinear approach using invertible neural networks. The breadth of experiments across multiple LLM families, scales, and safety benchmarks demonstrates strong practical applicability. Given the intense focus on AI safety and alignment, this work has immediate real-world relevance and broad impact. Paper 2 makes rigorous theoretical contributions to identifiability of neural interaction discovery, but targets a narrower audience (time-series causal discovery) with more specialized applications, limiting its breadth of impact despite strong methodological rigor.

claude-opus-4-6·Jun 9, 2026