The Query Channel: Information-Theoretic Limits of Masking-Based Explanations
Erciyes Karakaya, Ozgur Ercetin
Abstract
Masking-based post-hoc explanation methods, such as KernelSHAP and LIME, estimate local feature importance by querying a black-box model under randomized perturbations. This paper formulates this procedure as communication over a query channel, where the latent explanation acts as a message and each masked evaluation is a channel use. Within this framework, the complexity of the explanation is captured by the entropy of the hypothesis class, while the query interface supplies information at a rate determined by an identification capacity per query. We derive a strong converse showing that, if the explanation rate exceeds this capacity, the probability of exact recovery necessarily converges to one in error for any sequence of explainers and decoders. We also prove an achievability result establishing that a sparse maximum-likelihood decoder attains reliable recovery when the rate lies below capacity. A Monte Carlo estimator of mutual information yields a non-asymptotic query benchmark that we use to compare optimal decoding with Lasso- and OLS-based procedures that mirror LIME and KernelSHAP. Experiments reveal a range of query budgets where information theory permits reliable explanations but standard convex surrogates still fail. Finally, we interpret super-pixel resolution and tokenization for neural language models as a source-coding choice that sets the entropy of the explanation and show how Gaussian noise and nonlinear curvature degrade the query channel, induce waterfall and error-floor behavior, and render high-resolution explanations unattainable.
AI Impact Assessments
(1 models)Scientific Impact Assessment: "The Query Channel: Information-Theoretic Limits of Masking-Based Explanations"
1. Core Contribution
This paper reframes masking-based post-hoc explanation methods (LIME, KernelSHAP) as communication over a "query channel," where the unknown explanation vector is a message, binary masks are channel inputs, and oracle evaluations are channel outputs. The key results are: (a) a strong converse theorem showing that when explanation rate exceeds channel capacity, no decoder can reliably recover explanations; (b) an achievability result via ML decoding below capacity; (c) a source-coding interpretation of super-pixel resolution that predicts a critical segmentation beyond which explanations are information-theoretically impossible.
The conceptual reframing is the paper's primary contribution—translating XAI instability from an algorithmic concern into a fundamental information-theoretic limit. The identification of an "algorithmic gap" between what information theory permits and what practical convex surrogates (Lasso, OLS) achieve is a useful diagnostic concept.
2. Methodological Rigor
The theoretical framework is correct but involves relatively straightforward adaptations of classical results. Theorem 1 applies Wolfowitz's strong converse by observing that M → Φ → (Z, Y) forms a Markov chain and bounding mutual information via the data processing inequality—the technical novelty beyond this observation is limited. The achievability proof (Theorem 2) follows standard random coding arguments from compressed sensing, and the resulting T = Ω(k log(d/k)) scaling was already established by Wainwright (2009) using different machinery.
The treatment of model misfit as additive Gaussian noise (Section II-D) is acknowledged as a "worst-case lower bound on capacity" but represents a significant simplification. Real model nonlinearities produce structured, mask-dependent interference that may violate the i.i.d. assumption central to the memoryless channel formulation. The paper's own interference-limited experiments (Section V-D) reveal error floors, but the theoretical framework cannot tightly characterize these since the Gaussian approximation is loose for structured curvature.
The Monte Carlo MI estimator (Equation 34) is standard importance sampling. The experimental setup (d=12, k=2, σ=0.1) enables exhaustive ML decoding but is far from realistic XAI scenarios where d ranges from hundreds to thousands. The image experiments use a controlled oracle (intensity summation) rather than an actual neural network, limiting external validity.
3. Potential Impact
The paper could shift how the XAI community conceptualizes explanation reliability—from purely algorithmic instability to fundamental information limits. This is analogous to how information-theoretic bounds shaped compressed sensing research. Specific potential impacts include:
However, practical impact is constrained by several factors: the bounds are likely loose for realistic models, no new algorithms are proposed to close the identified gap, and the framework applies most cleanly to linear-Gaussian settings that poorly approximate real neural network behavior.
4. Timeliness & Relevance
XAI reliability is an active concern, with growing regulatory interest (EU AI Act) in explanation quality. The paper addresses the genuine problem that LIME and KernelSHAP produce unstable explanations, and the observation that this may be fundamental rather than fixable is timely. The connection to Rao (2025), which addresses complexity-theoretic limits via Kolmogorov complexity, positions this work on a complementary axis—recoverability vs. existence of explanations.
5. Strengths & Limitations
Strengths:
Limitations:
6. Additional Observations
The paper treats non-adaptive (i.i.d.) mask sampling throughout experiments while claiming the strong converse holds for adaptive strategies. Demonstrating that adaptive querying can approach capacity more efficiently would substantially increase practical relevance. The connection between the query channel and active learning / Bayesian experimental design literatures is underexplored and could yield richer results.
Generated May 5, 2026
Comparison History (29)
Paper 2 has higher potential impact: it introduces a unifying information-theoretic framework for a broad and widely used class of explainability methods (LIME/SHAP), derives strong converse and achievability bounds (high methodological rigor), and yields actionable guidance on query budgets, resolution/tokenization, and when explanations are fundamentally impossible. These results can influence interpretability, auditing, and regulatory practice across ML subfields. Paper 1 is timely and practically useful for LLM efficiency, but is a narrower, incremental algorithmic refinement with more limited cross-domain theoretical reach.
Paper 1 introduces a more novel, theoretically grounded framework by casting masking-based explainers as an information-theoretic query channel, deriving strong converse and achievability results with capacity-style limits and non-asymptotic benchmarks. This offers durable, broadly applicable insights across interpretability, statistics, and information theory, with clear methodological rigor and potential to reshape how query-based explanations are evaluated. Paper 2 is timely and practically useful for LLM efficiency, but is a training-free heuristic aggregation method with narrower conceptual novelty and more incremental impact relative to existing early-exit/test-time scaling work.
Paper 2 introduces a novel information-theoretic framework for understanding fundamental limits of masking-based explanation methods (SHAP, LIME), establishing capacity bounds analogous to Shannon's channel coding theorem. This provides deep theoretical foundations for explainable AI, a critically important field. The strong converse and achievability results give fundamental impossibility/possibility guarantees that transcend specific algorithms. Paper 1, while solid engineering combining Mamba with frequency-domain analysis for time series forecasting, is more incremental—combining known components in a new architecture. Paper 2's theoretical contributions have broader, more lasting impact across XAI, information theory, and trustworthy ML.
Paper 1 offers a more fundamentally novel and rigorous contribution: an information-theoretic formulation of masking-based explainers with strong converse and achievability results, yielding principled limits and benchmarks that can reshape how the field evaluates explainability methods. Its impact can extend across ML interpretability, statistics, information theory, and black-box model auditing, with clear timeliness given widespread LIME/SHAP use. Paper 2 is application-relevant and timely for LLM agents, but appears more as a systems/architecture proposal with empirical gains whose generality and theoretical depth may be less durable than Paper 1’s foundational limits.
Paper 2 introduces a novel, rigorous information-theoretic framework to understand the fundamental limits of explainable AI. By bridging communication theory and XAI, it provides mathematically grounded bounds that outlast specific model architectures. While Paper 1 is highly practical and timely for LLM agent system design, its findings are empirical and tied to current model generations. Paper 2's deep theoretical contributions and methodological rigor offer a more lasting and profound scientific impact.
Paper 2 addresses the critical and timely problem of LLM safety/jailbreaks with a practical, mechanistic approach (LOCA) that provides actionable local causal explanations. Its direct applicability to AI safety for frontier models gives it broad urgency and real-world impact. Paper 1 provides elegant information-theoretic foundations for explainability methods, but its contributions are more theoretical and incremental within the XAI community. Paper 2's combination of novelty (local causal explanations vs. global), timeliness (LLM safety is a top priority), and demonstrated empirical superiority over baselines positions it for higher near-term scientific impact.
Paper 1 establishes fundamental information-theoretic limits for a widely-used class of explainability methods (LIME, KernelSHAP), providing novel theoretical foundations (converse and achievability results) that reframe explanation quality as a channel capacity problem. This offers deep, lasting insight into why and when explanations fail, with broad implications across XAI. Paper 2, while practically useful, is primarily an engineering benchmark contribution for LLM agents—a fast-moving area where benchmarks are quickly superseded. Paper 1's theoretical framework is more likely to have enduring scientific impact across multiple fields.
Paper 2 introduces a novel theoretical framework connecting explainable AI to information theory, deriving fundamental limits (strong converse, achievability) for masking-based explanation methods like SHAP and LIME. This provides deep, generalizable insights with broad implications across ML interpretability, information theory, and trustworthy AI. Paper 1, while practically useful as a benchmark suite for LLM agents, is more incremental—improving evaluation methodology rather than establishing fundamental theoretical contributions. Paper 2's information-theoretic bounds will likely influence how the community designs and understands explanation methods for years to come.
Paper 1 introduces a foundational theoretical framework linking information theory and explainable AI, establishing fundamental mathematical limits for explanation methods. Its rigorous proofs and theoretical bounds will likely have a long-lasting influence on XAI algorithm design. In contrast, Paper 2 is an empirical benchmarking study of current agent frameworks; while practically useful, its findings are transient and will quickly become outdated as the evaluated software is updated.
Paper 1 offers a principled, information-theoretic reframing of a broad class of masking-based explainers with strong converse and achievability results, plus non-asymptotic benchmarks and empirical validation—high methodological rigor and likely lasting impact across XAI, ML theory, and evaluation practice. Paper 2 is timely and application-relevant for LLM post-training, but its core contributions (multi-agent judging reward, GRPO variant, dataset grounding) seem more incremental and sensitive to empirical setup, with narrower theoretical guarantees and potentially shorter shelf-life as LLM training paradigms evolve.
Paper 2 establishes fundamental information-theoretic limits for widely-used explanation methods (SHAP, LIME), providing a novel theoretical framework with broad implications across all of ML interpretability. Its strong converse and achievability results are mathematically rigorous and reveal fundamental impossibility boundaries that apply regardless of algorithmic improvements. Paper 1, while practically valuable for RTL optimization, is more domain-specific (EDA/chip design) and incremental in nature. Paper 2's breadth of impact across explainable AI, its novel channel-coding formulation, and its potential to reshape how the community thinks about explanation fidelity give it higher long-term scientific impact.
Paper 2 addresses a critical and highly timely issue in AI safety—the subliminal transfer of unsafe behaviors during agent distillation. Its finding that explicit data sanitization is an insufficient defense has immediate, broad implications for the alignment and deployment of autonomous AI agents. While Paper 1 offers a rigorous and novel theoretical framework for XAI, Paper 2's direct relevance to preventing unsafe AI behaviors gives it greater urgency and potential for broad real-world impact.
Paper 2 establishes fundamental information-theoretic limits for widely used explainable AI (XAI) methods like LIME and SHAP. By formulating masking-based explanations as a communication channel, it provides rigorous mathematical bounds on explainability. While Paper 1 offers a highly relevant empirical study on AI persuasion, Paper 2's theoretical rigor dictates the foundational limits of algorithmic interpretability, offering a broader and potentially longer-lasting impact across the core of machine learning and algorithm design.
Paper 2 introduces a novel information-theoretic framework for understanding fundamental limits of masking-based explanations (SHAP, LIME), bridging information theory and explainable AI. This conceptual contribution—formulating explanation as communication over a query channel with converse and achievability results—provides foundational theoretical insights applicable broadly across XAI. It identifies fundamental impossibility results and explains when/why popular methods fail. Paper 1, while solid, is more incremental, combining known techniques (kNN, graph structure learning) for heterogeneous graphs with heterophily. Paper 2's cross-disciplinary novelty and theoretical depth suggest broader and more lasting impact.
Paper 2 offers a more novel, general theoretical framing—casting masking-based explainers as an information-theoretic query channel—and delivers strong converse and achievability results with non-asymptotic benchmarks, enabling principled limits and diagnostics across many explanation methods and model types. Its methodological rigor and breadth (ML interpretability, information theory, optimization, NLP/CV tokenization) suggest durable, cross-field impact and timeliness given widespread reliance on LIME/SHAP-style tools. Paper 1 is timely and applied to safety/alignment, but is more domain-specific and likely less broadly foundational.
Paper 1 addresses the critical and highly timely challenge of LLM reasoning and alignment. By introducing a novel optionized next-token prediction method, it significantly boosts math reasoning performance and sample efficiency for RLHF. Its practical utility and direct applicability to state-of-the-art LLMs suggest a broader and more immediate impact on the field compared to Paper 2, which, while theoretically rigorous, focuses on bounds for existing explainability methods.
Paper 1 introduces a fundamental impossibility theorem for AI governance, analogous to Arrow's theorem but for AI accountability. Its rigorous mathematical proof that highly autonomous AI systems fundamentally break existing accountability frameworks has profound implications across AI safety, law, ethics, and policy. While Paper 2 offers strong theoretical advancements in Explainable AI, Paper 1's cross-disciplinary breadth, timeliness regarding autonomous agents, and potential to reshape global AI regulatory paradigms give it a significantly higher potential for broad scientific and societal impact.
Paper 2 establishes fundamental information-theoretic limits for widely used explainability methods (LIME, SHAP). This high methodological rigor and profound theoretical contribution to Explainable AI offers broad impact across all fields relying on black-box models. While Paper 1 presents a valuable, timely healthcare application, Paper 2 provides foundational scientific insights that dictate the theoretical bounds of AI interpretability.
Paper 2 introduces a novel information-theoretic framework for understanding fundamental limits of masking-based explanation methods (SHAP, LIME), providing rigorous theoretical contributions including converse and achievability results. This creates a new theoretical foundation connecting explainable AI with information theory, offering broadly applicable insights across any domain using these popular methods. Paper 1, while timely, is primarily an empirical survey of healthcare agent skills on a specific platform, with narrower scope and more incremental contributions to the field.
Paper 2 makes a rigorous, novel theoretical contribution by framing masking-based explanations (SHAP, LIME) as an information-theoretic channel problem, deriving fundamental limits (strong converse, achievability). This provides deep, generalizable insights into widely-used XAI methods, identifying regimes where current methods fundamentally fail. Its mathematical rigor, connection to established information theory, and broad applicability to explainability research give it high impact potential. Paper 1 proposes a systems architecture (Aethon) for agent instantiation but is largely conceptual, lacking empirical validation, and addresses an engineering optimization rather than revealing fundamental scientific principles.