The Two Boundaries: Why Behavioral AI Governance Fails Structurally
Alan L. McCann
Abstract
Every system that performs effects has two boundaries: what it can do (expressiveness) and what governance covers (governance). In nearly all deployed AI systems, these boundaries are defined independently, creating three regions: governed capabilities (the only useful region), ungoverned capabilities (risk), and governance policies that address non-existent capabilities (theater). Two of the three regions are failure modes. We focus on the governance of effects: actions that AI systems perform in the world (API calls, database writes, tool invocations). This is distinct from the governance of model outputs (content quality, bias, fairness), which operates at a different level and requires different mechanisms. We present a formal framework for analyzing this structural gap. Rice's theorem (1953) proves the gap is undecidable in the general case for any Turing-complete architecture that attempts to govern effects behaviorally: no algorithm can decide non-trivial semantic properties of arbitrary programs, including the property "this program's effects comply with the governance policy." We define coterminous governance: a system property where the expressivenessboundary equals the governance boundary. We show that coterminous governance requires an architectural decision (separatingcomputation from effect) rather than a governance layer added after the fact. We show that structural governance under this separation subsumes separate governance infrastructure: governance checks become part of the execution pipeline rather than a second system running alongside it. We propose coterminous governance as the testable criterion for any AI governance system: either the two boundaries are provably identical, or risk and theater are structurally inevitable. Proofs are mechanized in Coq (454 theorems, 36 modules, 0 admitted).
AI Impact Assessments
(1 models)Scientific Impact Assessment
Core Contribution
This paper introduces the "two-boundary model" for AI effect governance: the observation that every effectful system has an expressiveness boundary (what it can do) and a governance boundary (what oversight covers), and that these are almost always misaligned, creating three regions — governed capability (functional), ungoverned capability (risk), and governance theater (waste). The paper argues that Rice's theorem makes behavioral governance of effects fundamentally undecidable for Turing-complete systems, and proposes "coterminous governance" — an architectural pattern where computation is separated from effect production, making the two boundaries identical by construction. The authors claim mechanized Coq proofs (454 theorems, 0 admitted) in a companion paper.
Methodological Rigor
The paper's central theoretical argument — applying Rice's theorem to show undecidability of behavioral effect governance — is mathematically sound but not novel. Rice's theorem has been well-understood since 1953, and its implications for program analysis are standard in computer science. The paper's contribution is framing this known result in the specific context of AI governance, which is useful but should be recognized as application rather than discovery.
The separation of computation from effect is presented as the key architectural insight, but this too is well-established: Haskell's IO monad, algebraic effects (Plotkin & Pretnar), capability-based security (Dennis & Van Horn, Miller), and operating system syscall interfaces all implement this separation. The paper acknowledges these precedents but claims novelty in (a) applying the separation to AI workflow governance and (b) formally proving governance properties. However, the formal proofs are entirely deferred to companion papers, making the central claims of this paper unverifiable as presented.
The performance measurements (0.23ms governed vs 0.24ms ungoverned) are interesting but presented with minimal methodological detail — only 50 iterations with 5-iteration warmup on a single platform. The claim that governance overhead is "indistinguishable from the ungoverned baseline" is stronger than the data supports given such a small sample and no statistical testing.
Several aspects raise concerns about rigor:
Potential Impact
The paper addresses a genuinely important problem: as AI systems become more agentic (tool use, autonomous action), governance of their real-world effects becomes critical. The two-boundary framing is pedagogically valuable and could influence how practitioners think about governance architecture.
However, the practical impact is limited by several factors:
1. Retrofit impossibility: The paper explicitly states structural governance cannot be retrofitted. This means the entire existing ecosystem of AI agents would need re-architecture — an enormous practical barrier.
2. The policy decision problem remains: Even with coterminous governance, someone must write the policy that the governance boundary enforces. If the policy is "allow all effects from trusted components," the structural guarantee is vacuous. The hard problem of specifying correct policies is untouched.
3. The companion paper dependency: All formal proofs are deferred, making the paper's strongest claims unjudgeable. The Coq development is referenced but the core paper stands largely on argumentation rather than proof.
Timeliness & Relevance
The paper is highly timely. The rapid deployment of agentic AI systems (tool-using LLMs, autonomous agents) has created genuine governance challenges. The distinction between governing what models *say* versus what they *do* is important and underappreciated. The paper arrives at a moment when the AI safety community is grappling with exactly these questions.
The connection to Dalrymple et al.'s "guaranteed safe AI" framework is apt and positions this work within an emerging research direction emphasizing architectural rather than behavioral safety.
Strengths
1. Clear conceptual framing: The two-boundary model and three-region taxonomy (governed, risk, theater) are memorable and useful abstractions that could enter common parlance.
2. Important distinction: Separating effect governance from content governance is a valuable conceptual contribution that clarifies muddled discourse.
3. Comprehensive related work: The paper situates itself well within programming language theory, security, and AI safety literatures.
4. The coterminous governance criterion: Proposing a testable yes/no criterion for governance completeness is practically useful.
Limitations & Weaknesses
1. Limited novelty: The core ideas (Rice's theorem implications, computation/effect separation, capability-based security) are well-established. The synthesis is useful but the individual components are not new.
2. Deferred proofs: The paper's strongest claims rest on companion papers. Without examining the Coq development, the formal claims cannot be evaluated.
3. Overly binary framing: Real governance operates on a spectrum. The paper's absolutist stance ("zero risk by construction" vs. behavioral approaches) understates the value of probabilistic approaches and overstates what structural governance achieves in practice.
4. Missing empirical validation: No case studies of real AI systems analyzed through the two-boundary lens, no comparison of governance failures that would/wouldn't be caught.
5. Single-author ecosystem: Five companion papers by the same author, all cited as "to appear," raises concerns about independent validation.
6. The semantic gap persists: Even with structural governance, the governance boundary must evaluate directives against policy — and if the policy involves semantic properties of the directive's *purpose*, Rice-like difficulties resurface at this level.
Overall Assessment
This is a well-written position paper that provides a useful conceptual framework for thinking about AI effect governance. Its main value is pedagogical and architectural: clarifying the distinction between content and effect governance, and arguing for architectural solutions. However, the theoretical contributions are applications of known results, the formal proofs are entirely deferred, and the practical implications require re-architecting all existing systems. The paper would be significantly strengthened by concrete case studies, available proofs, and a more nuanced treatment of the behavioral/structural spectrum.
Generated May 5, 2026
Comparison History (31)
Paper 2 has higher potential impact: it introduces a broadly applicable formal framework for AI governance of real-world effects, connects the core failure mode to a fundamental computability limit (Rice’s theorem), and proposes an architectural criterion (coterminous governance) with mechanized Coq proofs, indicating high methodological rigor. Its implications span AI safety, security, formal methods, programming languages, and policy, making it timely and cross-disciplinary. Paper 1 is a solid ML contribution with practical relevance, but is more incremental within a fast-moving VLM/RL exploration literature and narrower in breadth.
Paper 1 offers a fundamental theoretical critique and formal framework for AI safety and governance, backed by mechanized Coq proofs. By invoking Rice's theorem to prove the undecidability of behavioral governance, it challenges the core architecture of current AI systems. This structural insight has profound, long-term implications for AI alignment and system design, offering broader and more paradigm-shifting scientific impact compared to Paper 2's methodological, though valuable, incremental improvements to LLM fine-tuning.
Paper 2 offers a foundational, mathematically rigorous critique of current AI governance approaches, backed by fully mechanized Coq proofs. By applying Rice's theorem to prove the structural failure of behavioral AI governance and proposing a provably safe architectural alternative, it has the potential for deep, cross-disciplinary impact on agentic AI design, safety, and regulation, surpassing the narrower (though practical) optimization improvements of Paper 1.
Paper 2 addresses a fundamental, field-wide challenge in AI governance by formally proving structural limitations of current behavioral governance approaches using Rice's theorem. Its use of mechanized Coq proofs adds immense methodological rigor. The proposal of 'coterminous governance' has broad implications for the architecture of safe AI agents. In contrast, Paper 1 focuses on a niche economic problem of pricing language data assets, which has narrower theoretical and practical applications.
Paper 2 addresses a fundamental and critical issue in AI safety and governance with broad architectural implications. By leveraging computation theory (Rice's Theorem) and providing Coq-mechanized formal proofs, it offers exceptional methodological rigor and establishes a structural paradigm shift (coterminous governance). In contrast, Paper 1 focuses on a more niche operations research problem applied to language data pricing, which has narrower impact and less foundational significance.
Paper 1 offers a more novel, foundational contribution by formalizing a structural limitation of behavioral AI governance using Rice’s theorem and proposing an architectural criterion (coterminous governance) with mechanized Coq proofs, suggesting high methodological rigor and broad relevance to AI safety, systems design, and policy. Its implications generalize across many AI agent/tooling architectures and are timely given real-world deployment of tool-using models. Paper 2 is a solid applied ML advance in heterogeneous graph learning, but likely more incremental within an active subfield and with narrower cross-domain impact.
Paper 2 has higher potential impact: it introduces a general, formally grounded framework explaining structural limits of behavioral AI governance using Rice’s theorem, and proposes an architectural criterion (coterminous governance) with mechanized Coq proofs, suggesting broad applicability to AI safety, security, programming languages, and systems design. Its claims are timely and relevant to real-world deployment of tool-using agents and could reshape governance architectures. Paper 1 is a solid methodological contribution to heterogeneous graph learning robustness, but is more incremental within an active subfield and likely has narrower cross-disciplinary reach.
Paper 2 likely has higher near-term scientific impact: it introduces an open-source benchmark suite for specification gaming, provides empirical results across multiple frontier models, and identifies actionable drivers (RL reasoning training, budget effects, mitigations). This is timely, directly usable by the community, and can influence both evaluation practice and training methodology across labs. Paper 1 is conceptually deep and formally rigorous, but its central undecidability framing may be seen as reframing known limits (Rice’s theorem) with narrower immediate adoption, potentially reducing short-term breadth despite high theoretical value.
Paper 2 is likely to have higher near-term scientific impact: it provides an open-source benchmark suite, systematic empirical findings across multiple frontier models, and actionable insights about how RL reasoning training affects specification gaming—directly usable by labs for evaluation and training decisions. Its applications are immediate and broadly relevant to alignment, agent safety, and RLHF/RLAIF practice. Paper 1 is conceptually novel and formally rigorous (Coq mechanization) with a strong theoretical framing, but its real-world impact depends on adoption of specific architectures and may be narrower and slower-moving.
Paper 2 offers a profound theoretical contribution to AI safety and agent governance by formalizing the structural limits of current approaches using Rice's theorem. Its introduction of 'coterminous governance' as a necessary architectural shift, backed by exceptionally rigorous mechanized Coq proofs, gives it foundational importance. While Paper 1 provides a useful HCI tool for prompt engineering, Paper 2 addresses a critical, urgent challenge in AI agent safety with broad implications across formal verification, AI architecture, and policy, giving it substantially higher potential scientific impact.
Paper 2 presents a novel formal framework for AI governance with mechanized proofs (454 Coq theorems), introducing the concept of 'coterminous governance' grounded in Rice's theorem. It addresses a fundamental structural problem in AI safety/governance with broad implications across policy, engineering, and formal methods. Paper 1, while methodologically sound, is primarily an empirical benchmark of existing LLMs on a specific task—useful but incremental, with findings that will quickly become outdated as models evolve. Paper 2's theoretical contribution has longer-lasting significance and broader cross-disciplinary impact.
Paper 2 has higher potential impact due to a broadly applicable, theoretically grounded framework for AI governance that leverages an undecidability result (Rice’s theorem) and proposes a testable architectural criterion (coterminous governance). Its mechanized Coq proofs suggest strong methodological rigor, and the ideas generalize across many AI systems beyond a single task domain. Paper 1 is timely and practically relevant but narrower in scope (fraud-advice interactions) and more contingent on current LLM behavior and experimental design, limiting cross-field breadth and long-term influence.
Paper 1 offers a foundational theoretical breakthrough rather than an incremental benchmark. By applying Rice's Theorem to AI governance and mechanizing the proofs in Coq, it rigorously demonstrates why current behavioral governance structurally fails. Introducing 'coterminous governance' as an architectural requirement presents a paradigm-shifting approach to AI safety with profound, long-lasting implications across computer science and policy. While Paper 2 provides a valuable and timely evaluation suite (Claw-Eval), benchmarks are frequently superseded, whereas Paper 1's provable, structural framework delivers a deeper, more enduring scientific impact.
Paper 2 addresses a fundamental, cross-cutting problem in AI governance with formal rigor (454 Coq-verified theorems), establishing a provable impossibility result via Rice's theorem and proposing a testable architectural criterion (coterminous governance). Its implications span AI safety, policy, systems architecture, and formal verification—fields of enormous current importance. While Paper 1 is a solid engineering contribution combining LLMs with mechanical design optimization, its scope is narrower (linkage design) and its methods are more incremental. Paper 2's formal framework could reshape how the entire field thinks about AI governance architecture.
Paper 2 demonstrates exceptional methodological rigor through mechanized Coq proofs and applies foundational computer science theory (Rice's Theorem) to a critical, timely issue. By mathematically proving the structural failure of current behavioral AI governance and proposing a verifiable architectural solution, it offers a definitive, highly actionable paradigm shift for AI safety with massive real-world implications.
Paper 2 presents a novel formal framework for AI governance grounded in Rice's theorem, with mechanized Coq proofs (454 theorems), establishing a fundamental theoretical result about the structural impossibility of behavioral AI governance. This addresses a critical, timely problem with broad implications across AI safety, policy, and systems architecture. Paper 1, while practical, is more incremental—applying known cognitive science concepts to LLM memory management with modest empirical gains. Paper 2's formal rigor, foundational nature, and cross-disciplinary relevance (CS theory, policy, AI safety) give it higher potential for lasting scientific impact.
Paper 2 provides novel empirical evidence of a concrete, previously undemonstrated security threat—subliminal transfer of unsafe behaviors during AI agent distillation—with immediate implications for AI safety and deployment. Its findings that keyword filtering is insufficient to prevent behavioral bias transfer are directly actionable and alarming for the rapidly growing field of agent-based AI. Paper 1, while intellectually rigorous with Coq-verified proofs, presents a largely theoretical governance framework whose practical adoption faces significant barriers. Paper 2's empirical novelty and direct relevance to urgent AI safety concerns give it broader and more timely impact.
Paper 1 presents a novel formal framework for AI governance with foundational theoretical contributions—applying Rice's theorem to prove undecidability of behavioral governance, introducing 'coterminous governance' as a new concept, and providing mechanized Coq proofs (454 theorems). This addresses a fundamental structural problem in AI safety/governance with broad implications across all deployed AI systems. Paper 2 offers an incremental improvement (1.18% average) on KGQA tasks with a narrower scope. Paper 1's cross-disciplinary impact (AI safety, formal verification, policy) and timeliness given rapid AI deployment give it substantially higher potential impact.
MathNet provides a large-scale, publicly available benchmark resource (30,676 problems, 47 countries, 17 languages) that will likely be widely adopted by the AI research community for evaluating mathematical reasoning and retrieval capabilities. Benchmarks of this quality and scale tend to accumulate high citations and drive research agendas. Paper 1 presents an intellectually interesting formal framework for AI governance with Coq proofs, but its highly theoretical nature, narrow focus on effect governance, and lack of empirical validation limit its near-term adoption and breadth of impact across the research community.
Paper 1 presents a novel formal framework for AI governance with fundamental theoretical contributions—applying Rice's theorem to prove structural undecidability of behavioral governance and proposing 'coterminous governance' as a new architectural paradigm. The 454 Coq-mechanized theorems demonstrate exceptional rigor. Its implications span all deployed AI systems, making it broadly impactful across AI safety, formal methods, and policy. Paper 2, while practically useful, is a domain-specific application combining existing techniques (LLMs, deep learning classifiers) for speech therapy—incremental rather than foundational.