Fundamental Limitation in Explaining AI
Atsushi Suzuki, Jing Wang
Abstract
While large-scale models such as LLMs and diffusion models have achieved practical success, public institutions have emphasized the importance of explainability in AI. Existing methods for explaining AI, however, are not designed to provide completely faithful explanations of the behavior of large-scale AI systems. Although a completely faithful and interpretable explanation of the behavior of an AI system might be useful for AI governance, it has not been known whether providing such an explanation is theoretically possible. In this paper, we mathematically prove a fundamental quadrilemma in explaining AI, stating that AI and its explanation cannot satisfy the following four conditions simultaneously: 1) the complexity of the operation environment, 2) the goodness of the AI's performance, 3) the interpretability of the AI's explanation, and 4) the complete faithfulness of the AI's explanation. This quadrilemma suggests that, in most applications where we cannot change the environment or sacrifice good AI performance and an interpretable explanation, we should give up complete faithfulness of explanations and should instead aim to explain only the parts that are important for applications. As a consequence, the quadrilemma implies that AI governance should be designed on the premise that the faithfulness of AI explanations is always incomplete.
AI Impact Assessments
(1 models)Scientific Impact Assessment: "Fundamental Limitation in Explaining AI"
1. Core Contribution
This paper proves a formal impossibility result—termed a "fundamental quadrilemma"—showing that an AI system and its explanation cannot simultaneously satisfy four conditions: (1) complexity of the operating environment, (2) good AI performance, (3) interpretability of explanations, and (4) complete faithfulness of explanations. The central technical result is an inequality (Theorem 1) relating expected logarithmic conditional perplexity, explanation length, and the expected conditional Kolmogorov complexity of the environment. The key implication is that in complex environments where good performance is required and explanations must be human-interpretable, complete faithfulness of explanations must be sacrificed.
2. Methodological Rigor
The paper is mathematically rigorous. The formalization is thorough, with careful definitions of stochastic input-output AI (encompassing LLMs, diffusion models, and deterministic systems), completely faithful explanations via interpretation functions, and environment complexity via expected conditional Kolmogorov complexity. The proof chain is well-structured: Shannon-Fano-Elias coding provides the code-length bound, which connects perplexity to prefix-free Kolmogorov complexity, and then the relationship between plain and prefix-free Kolmogorov complexity closes the gap.
A particularly clever methodological choice is using *shortness* as a proxy for interpretability. Since shortness is a necessary (though not sufficient) condition for interpretability, proving impossibility of short explanations automatically implies impossibility of interpretable explanations. This sidesteps the notoriously difficult problem of formally defining "interpretability."
The paper also makes a non-obvious but important technical contribution in showing that the naïve choice of C(P(·|·)) as the complexity measure fails (Proposition 9 in Appendix D), and that E[C(Y|X)] is the correct quantity. This is demonstrated rigorously with a counterexample showing the gap between algorithmic complexity expectations and Shannon entropy can be arbitrarily large.
However, the result has a structural limitation: the constant *c* in the inequality is unspecified and depends on fixed computational infrastructure (universal Turing machine, interpretation function, encoding). While standard for Kolmogorov complexity results, this means the theorem provides qualitative rather than quantitative guidance—one cannot determine concrete explanation length bounds for specific systems.
3. Potential Impact
For XAI research: The quadrilemma provides theoretical justification for the design philosophy of existing explanation methods (SHAP, LIME, Grad-CAM, etc.) that deliberately sacrifice faithfulness. It redirects the field away from pursuing completely faithful explanations toward "explaining only the parts that are important for applications."
For AI governance: The paper's most practically consequential claim is that governance frameworks should be designed on the premise that AI explanations are inherently incomplete. This directly speaks to regulatory efforts like the EU AI Act and NIST AI Risk Management Framework, which emphasize explainability requirements.
For theoretical AI: The connection between Kolmogorov complexity, perplexity, and explainability opens a new theoretical angle. The observation that large-scale models achieving good performance in complex environments necessarily resist faithful short explanations connects to broader questions about compression and intelligence.
4. Timeliness & Relevance
The paper is highly timely. As LLMs and diffusion models proliferate in high-stakes applications, regulators worldwide are imposing explainability requirements. The gap between these regulatory demands and what is theoretically achievable has not been formally characterized. This paper fills that gap with a clean impossibility result, arriving at a moment when the AI governance community needs principled guidance on what to demand from AI explanations.
5. Strengths & Limitations
Key Strengths:
Notable Limitations:
Comparison to Prior Art:
Unlike Zhang et al. (2023) and Bilodeau et al. (2024), which show impossibilities within restricted explanation classes (removal-based, feature attribution), this result is explanation-method-agnostic and incorporates AI performance and environmental complexity into the trade-off—a significant generalization.
Overall Assessment
This is a conceptually important theoretical contribution that establishes a fundamental limit in AI explainability. The mathematical execution is clean and the implications for both the XAI research agenda and AI governance are significant. The main limitations—uncomputable quantities, unspecified constants, restriction to complete faithfulness—are acknowledged by the authors and represent natural directions for future work rather than flaws. The paper's impact will likely be more in shaping the discourse around what explainability can achieve rather than providing immediately actionable technical tools.
Generated May 26, 2026
Comparison History (22)
Paper 1 establishes a fundamental theoretical impossibility result (quadrilemma) for AI explainability that has broad implications across all of AI governance, regulation, and XAI research. Its impact spans policy, theory, and practice, providing a mathematical foundation that constrains what can be expected from explainable AI systems. Paper 2, while methodologically rigorous and practically useful for LLM evaluation, addresses a narrower issue in post-training evaluation methodology. The fundamental nature of Paper 1's impossibility result gives it greater potential to reshape discourse across multiple fields including AI policy, law, and computer science.
Paper 2 establishes a fundamental impossibility result (quadrilemma) for AI explainability that has broad implications across all of AI governance, regulation, and XAI research. Such theoretical impossibility results tend to have outsized impact by reshaping entire research agendas. Paper 1, while addressing an important practical problem in RAG systems with a novel approach, is more narrowly scoped to retrieval-augmented generation attribution. Paper 2's implications span policy, regulation, and the theoretical foundations of interpretability, giving it broader cross-disciplinary reach and timeliness given current AI governance debates.
Paper 1 offers a concrete, testable mechanistic account of CoT’s probe-time benefits (local co-occurrence/lexical activation) supported by controlled ablations and robustness across models/datasets, making it both methodologically rigorous and immediately actionable for prompting, evaluation, and interpretability research. Its findings can reshape how CoT is understood and used in practice. Paper 2 is timely and potentially broad for governance, but its impact hinges on the generality/assumptions of the formalization; such impossibility-style results often translate less directly into empirical practice. Overall, Paper 1 is likelier to drive near-term scientific and applied follow-up work.
Paper 2 likely has higher scientific impact due to its broad, field-spanning theoretical result: a proved “quadrilemma” placing fundamental limits on faithful, interpretable explanations for high-performing AI in complex environments. This is novel, timely for governance and regulatory debates, and applicable across essentially all modern large-scale AI systems, shaping future explainability research agendas and policy assumptions. Paper 1 is innovative and rigorously validated with wet-lab results, with strong real-world relevance to drug delivery, but its impact is narrower to lipid/LNP design and specific agentic-LLM methodology.
Paper 2 likely has higher scientific impact due to a strong combination of theory and immediate practical relevance. It introduces a formal framework for inference-time boosting in agentic/committee LLM systems, provides bounds characterizing when selection reliably improves trajectories, and identifies necessary conditions (local soundness signals) with clear implications for system design. It also demonstrates substantial empirical gains on a high-profile benchmark (SWE-bench Verified) and matches stronger proprietary models, suggesting broad applicability across coding, reasoning, and verification-heavy domains. Paper 1 is novel and governance-relevant but may be less actionable and harder to validate empirically.
Paper 2 has higher potential impact because it offers a general theoretical impossibility result (a quadrilemma) about faithful, interpretable explanations under realistic constraints, which is broadly relevant across AI systems and directly informs AI governance and policy. Its breadth and timeliness (explainability + regulation) make it likely to influence multiple fields beyond ML (law, policy, HCI, safety). Paper 1 is technically novel and actionable for LVLM architecture, but its impact is narrower (Transformer module roles in LVLMs) and may be contingent on empirical reproducibility across model families.
Paper 1 has higher potential impact because it offers a general theoretical impossibility result (a quadrilemma) about faithful, interpretable explanations under realistic constraints, which could reshape expectations and policy for AI explainability across many model classes and application domains. Its breadth (AI theory, interpretability, governance) and timeliness (regulatory focus on explainability) are strong. Paper 2 is methodologically solid and practically valuable for fMRI decoding, but its impact is more domain-specific to neuroimaging/brain decoding and less likely to broadly affect multiple fields.
Paper 1 establishes a fundamental theoretical impossibility result (quadrilemma) for AI explainability, which has broad implications across all of AI governance, policy, and XAI research. It provides a mathematical proof that constrains the entire field, similar to how No Free Lunch theorems shaped optimization. This type of foundational result impacts multiple disciplines (CS, law, policy, philosophy) and will remain relevant as AI scales. Paper 2, while technically sound, is an incremental improvement in multi-agent RL for a specific military application domain with narrower impact.
Paper 2 likely has higher impact due to a concrete, reusable evaluation framework (CUSP) and a large, multidisciplinary benchmark enabling standardized measurement and follow-on work across scientometrics, AI evaluation, and science policy. Its findings on forecasting limits, domain heterogeneity, cutoff insensitivity, and miscalibrated uncertainty are immediately actionable for model development and decision-making. Paper 1 is conceptually novel and relevant to AI governance, but its impact depends heavily on the generality of its formal assumptions and may remain more theoretical with fewer direct empirical artifacts for the community to adopt.
Paper 2 offers a foundational mathematical proof establishing an impossibility theorem (a quadrilemma) in AI explainability. By proving that environment complexity, AI performance, interpretability, and complete faithfulness cannot coexist, it broadly impacts all AI research, policy, and governance. While Paper 1 provides an exceptionally rigorous empirical analysis of synthetic data limitations, its scope is largely confined to specific NLP classification tasks. Paper 2's universal theoretical constraints on Trustworthy AI and its direct relevance to global regulatory frameworks give it a significantly higher potential scientific impact across diverse disciplines.
Paper 2 establishes a fundamental theoretical impossibility result (a quadrilemma) about AI explainability that has broad implications across all of AI, not just one application domain. Such foundational impossibility theorems (akin to the No Free Lunch theorem or Arrow's impossibility theorem) tend to have outsized and lasting scientific impact because they reshape how entire research communities frame problems. It directly informs AI governance policy and redirects explainability research efforts. Paper 1, while valuable, presents an incremental engineering contribution combining known techniques (LLMs, fuzzy logic, symbolic reasoning) for a specific medical diagnosis application, with performance only comparable to existing methods.
Paper 1 presents a fundamental mathematical proof regarding the theoretical limits of AI explainability. Its findings have broad, universal implications for AI safety, governance, and theoretical computer science across all complex models. In contrast, Paper 2 focuses on a specific, narrow engineering application for industrial maintenance dialog systems, making its overall scientific impact significantly lower.
Paper 1 establishes a fundamental theoretical impossibility result (quadrilemma) for AI explainability, which has broad implications across all of AI governance, policy, and XAI research. Its impact spans computer science, law, ethics, and public policy, and is highly timely given global regulatory efforts (EU AI Act, etc.). While Paper 2 makes a solid methodological contribution to neural surrogate models with impressive empirical gains, its scope is narrower—primarily benefiting the scientific computing and operator learning communities. Fundamental impossibility theorems tend to have lasting, field-shaping influence.
Paper 1 establishes a fundamental mathematical impossibility theorem (a quadrilemma) regarding AI explainability, which has profound and long-lasting implications for AI safety, governance, and theoretical machine learning. In contrast, while Paper 2 introduces a valuable and rigorous benchmark for LLMs in optimization, benchmarks typically have a shorter lifespan of scientific relevance as models evolve. The theoretical and cross-disciplinary breadth of Paper 1 gives it a higher potential for foundational scientific impact.
Paper 1 offers a broadly applicable theoretical result: a fundamental impossibility (quadrilemma) for fully faithful, interpretable explanations under realistic complexity/performance constraints. Such limits can reshape expectations, evaluation criteria, and governance across many AI domains, making its impact potentially wide and durable. Paper 2 is a strong, timely methods contribution for offline RL/world models with solid theory+empirics and clear applications, but its influence is more concentrated within task-centric representation learning and planning. Overall, Paper 1’s cross-cutting, foundational nature suggests higher potential scientific impact.
Paper 2 establishes a fundamental theoretical limit in AI explainability, affecting the broad fields of XAI, AI safety, and governance. By mathematically proving an inherent quadrilemma, it provides a foundational result that shapes future policy and research directions. While Paper 1 offers a strong, practical methodological advancement for LLM program synthesis and planning, Paper 2 has a much broader, overarching scientific and societal impact.
Paper 2 establishes a fundamental theoretical limit (a mathematically proven quadrilemma) in Explainable AI, impacting a much broader range of domains than Paper 1. While Paper 1 provides a valuable algorithmic advancement for Knowledge Graph querying, Paper 2 addresses a highly timely and critical issue for all large-scale AI systems, including LLMs. By proving that complete faithfulness, interpretability, high performance, and complex environments cannot coexist, Paper 2 directly influences future AI governance, regulatory frameworks, and the foundational direction of XAI research, giving it significantly higher potential for widespread scientific and societal impact.
Paper 2 establishes a fundamental theoretical impossibility result (quadrilemma) about AI explainability that has broad implications across AI governance, policy, and the entire field of explainable AI (XAI). Such impossibility theorems tend to have lasting impact by reshaping how entire research communities approach problems. It affects policy decisions and has cross-disciplinary relevance (law, ethics, computer science). Paper 1, while solid applied work on LLM agents with implicit rules, addresses a narrower problem with incremental improvements on specific benchmarks and is more likely to be superseded by future methods.
Paper 2 establishes a fundamental mathematical impossibility theorem (a quadrilemma) for AI explainability, analogous to foundational theorems in other disciplines. While Paper 1 offers a highly practical and timely engineering solution for LLM safety, Paper 2's theoretical limitations will broadly impact foundational AI research, XAI methodology, and global AI governance by redefining what is theoretically achievable. This broader scope, crossing into policy and theoretical machine learning, yields a significantly higher potential for long-lasting scientific impact.
Paper 1 has higher potential scientific impact: it offers a theoretical impossibility-style result (a quadrilemma) about faithful interpretability for high-performing AI in complex environments, which can generalize across model classes and application domains. Such foundational limits can reshape research agendas (interpretable ML, XAI evaluation) and inform governance and policy assumptions broadly. Paper 2 is timely and practically valuable for proactive agents, but its contributions (metrics, auto-annotation, reward design, infra) are more incremental/system-specific and likely to impact a narrower set of applications and methods.