Overlaying Governance: A Compositional Authorization Framework for Delegation and Scope in Agentic AI

Amjad Ibrahim, Yong Li

Jun 2, 2026

arXiv:2606.03518v1 PDF

cs.AI(primary)cs.CR

#1626of 3355·Artificial Intelligence

#1626 of 3355 · Artificial Intelligence

Tournament Score

1408±45

10501800

50%

Win Rate

Wins

Losses

Matches

Rating

6.8/ 10

Significance7.5

Rigor6.5

Novelty7

Clarity7

Tournament Score

1408±45

10501800

50%

Win Rate

Wins

Losses

Matches

Rating

6.8/ 10

Significance

Rigor

Novelty

Clarity

Abstract

As AI systems evolve from passive models into autonomous active agents capable of initiating actions, collaborating, and delegating tasks, the traditional boundaries of software systems blur. Traditional authorization and delegation frameworks, built around fixed principals, explicit requests, and static scopes, are insufficient to govern agentic systems. Agentic AI demands richer authorization semantics: agents must inherit and delegate permissions, act under time-limited authority, and coordinate through shared protocols. Existing Identity and Access Management (IAM) systems fail to fully capture this notion of agency, lacking mechanisms for recursive delegation, contextual boundaries, and dynamic scoping as executable governance primitives. Unlike access delegation standards such as OAuth 2.0, we treat delegation as a contractual term rather than merely a static token-based consent credential. This paper proposes a compositional governance framework that introduces primitives indispensable for agentic AI. We define types of delegation and their permissions and accountability implications, and we introduce a notion of resource scope attenuation to bound agentic access envelopes. These concepts are expressed as general relational definitions that can be composed into existing authorization domains (e.g., financial systems). To operationalize this composition, we define a compositional operator that overlays new agentic semantics, such as recursive delegation chains, onto existing relational policies without rewriting them. We substantiate this framework through formal proofs and empirical evaluation, showing that it provides a formal yet practical foundation for accountable authorization in agentic AI systems.

AI Impact Assessments

(1 models)

Scientific Impact Assessment

Core Contribution

This paper addresses the authorization and access control challenges arising from agentic AI systems—autonomous agents that act on behalf of users, delegate tasks to sub-agents, and interact across trust boundaries. The core contribution is threefold: (1) a taxonomy of delegation types (full, scoped, conditional, depth-bounded, temporal, group) formalized as relational predicates rather than static tokens; (2) a compositional overlay operator that injects agentic governance primitives (delegation chains, scope envelopes) into existing Relation-Based Access Control (ReBAC) schemas without rewriting domain-specific policies; and (3) formal proofs of soundness properties—namely that domain principal permissions are preserved and that agent authorization is always traceable to a human-rooted delegation chain with valid scope.

The key conceptual shift is treating delegation as a *contractual runtime predicate* rather than a static credential (as in OAuth 2.0). The paper introduces an "authorization envelope"—the intersection of delegated authority and contextual scope—which dynamically constrains what an agent can do. The compositional operator, inspired by double-pushout (DPO) graph rewriting, enables reuse of existing enterprise authorization models.

Methodological Rigor

The formal development is reasonably rigorous. The paper defines a typed ReBAC schema, specifies applicability conditions (A1–A6) for the overlay operator, and proves three key results: conservative extension for domain principals (Lemma 5.1), human-rooted delegation (Lemma 5.2), and agent-authorization soundness (Theorem 5.3). The proofs, while not deeply complex, are structurally sound and follow naturally from the construction. The corollary on revocation is immediate but practically important.

The empirical evaluation compares baseline domain configurations against overlay-augmented configurations using OpenFGA across two representative use cases (Google Drive and Slack). The methodology is reasonable—1000 operations per scenario, 80/20 check/write split for overlay runs, seeded randomness for reproducibility. Results show median check latencies remain under 7ms even at scale (786K tuples for G8), with memory overhead ratios between 0.95–1.20. The divergence between mean and median latencies at scale (mean ratios up to 2.20×) reveals tail effects that deserve more investigation but are acknowledged.

However, several methodological concerns arise. The evaluation uses synthetic topologies with controlled templates, which may not capture the complexity of real-world delegation patterns. The paper acknowledges this limitation but does not quantify its impact. The claim of DPO-style rewriting is loosely stated—the paper sets K=L (non-deleting), simplifying the construction significantly, and the connection to formal graph transformation theory could be more precisely developed. Additionally, the depth-bounded delegation and group delegation types are described but their encoding is acknowledged as requiring schema expansion or external predicates, leaving them somewhat underspecified.

Potential Impact

The paper addresses a genuinely important and timely problem. As agentic AI systems proliferate across enterprise settings (coding assistants, financial agents, healthcare monitors), the lack of principled authorization frameworks poses real security risks. The compositional approach—allowing organizations to overlay agentic governance onto existing policies—has significant practical appeal, as it avoids the prohibitive cost of redesigning authorization systems from scratch.

The framework's grounding in OpenFGA/Zanzibar provides a clear implementation path. The open-source artifacts (generators, benchmarks, models) enhance reproducibility and potential adoption. The Agent Controller Engine (ACE) architecture offers a concrete integration blueprint for enterprise deployments.

Cross-domain applicability is a strength: the overlay can be applied to document systems, code repositories, collaboration platforms, and potentially other domains. The framework's relevance to prompt injection defense is noteworthy—by constraining agents' authorization envelopes, it limits the blast radius of successful attacks regardless of their origin.

The impact on adjacent fields includes: (i) IAM/security engineering, where the relational formalization of delegation types could influence standards development; (ii) multi-agent systems research, where the accountability and traceability properties address known governance gaps; and (iii) protocol design (MCP, A2A), where the framework offers richer semantics than currently available.

Timeliness & Relevance

The paper is exceptionally timely. Agentic AI is rapidly moving from research prototypes to production deployments, and authorization is widely recognized as a critical unsolved challenge. The OWASP threat model for agentic AI, the OpenID Foundation's analysis of OAuth limitations for agents, and industry protocols like MCP and A2A all point to an urgent need for the kind of framework this paper proposes. The paper directly engages with these contemporary developments and positions its contribution precisely at the gap between existing standards and agentic requirements.

Strengths

1. Well-motivated problem framing: The requirements (RQ1–RQ6) are clearly articulated and grounded in real agentic scenarios.

2. Compositionality: The overlay operator is the paper's most distinctive contribution—enabling reuse rather than replacement of existing authorization infrastructure.

3. Formal guarantees: The soundness theorem ensures agents cannot gain unauthorized access, providing security assurance.

4. Practical grounding: Implementation in OpenFGA with reproducible benchmarks bridges theory and practice.

5. Comprehensive treatment: The paper covers delegation taxonomy, formal model, composition operator, architecture, use case, proofs, and benchmarks.

6. Open artifacts: Full reproducibility package is provided.

Limitations

1. Scope attenuation underspecified: The paper acknowledges that semantic ordering for attenuation is domain-specific but provides no mechanism for verifying it, which is arguably the hardest part of the problem.

2. Limited real-world validation: No deployment with actual agentic systems or real users; the coding assistant use case remains illustrative.

3. Scalability of delegation chains: The empirical evaluation uses bounded chains (0–2 hops); behavior with deeper recursive delegations in adversarial settings is untested.

4. Single-engine dependency: While framed generally, the implementation and evaluation are tightly coupled to OpenFGA, and portability to other authorization engines is unclear.

5. Sequential/multi-policy composition: Acknowledged as future work, but this is critical for realistic multi-domain scenarios.

6. Tail latency concerns: Mean latency ratios reaching 2.2× at scale suggest potential issues for latency-sensitive applications that are not fully addressed.

7. The DPO connection is underdeveloped: The formal relationship to graph transformation theory is more suggestive than rigorous.

Overall Assessment

This is a solid systems-oriented security paper that identifies a real and pressing problem and offers a principled, practically grounded solution. The compositional overlay is genuinely novel in this context, and the combination of formal guarantees with empirical evaluation strengthens the contribution. The paper would benefit from real-world deployment validation and deeper treatment of scope attenuation, but it provides a strong foundation for an important emerging area.

Rating:6.8/ 10

Significance 7.5Rigor 6.5Novelty 7Clarity 7

Generated Jun 3, 2026

Comparison History (18)

vs. Stumbling Into AI Emotional Dependence: How Routine AI Interactions Reshape Human Connection

gemini-3.16/5/2026

Paper 2 has higher potential impact due to its broad interdisciplinary reach across HCI, psychology, and public policy. While Paper 1 offers a rigorous technical framework for AI security, Paper 2 addresses a profound societal issue: how routine AI use inadvertently alters fundamental human connections. Backed by a large-scale longitudinal study in collaboration with OpenAI, its empirical findings on behavioral shifts will likely trigger significant academic discourse, attract mainstream attention, and directly shape future AI regulations, resulting in a much larger overall scientific footprint.

vs. FALSIFYBENCH: Evaluating Inductive Reasoning in LLMs with Rule Discovery Games

claude-opus-4.66/5/2026

FALSIFYBENCH addresses a fundamental question about LLM reasoning capabilities relevant to scientific discovery, introducing a novel benchmark grounded in established cognitive science (Wason task). It evaluates 12 models with fine-grained analysis revealing actionable insights about confirmation bias in LLMs. This has broad impact across AI safety, cognitive science, and scientific automation. Paper 2 addresses an important but narrower engineering problem in authorization for agentic AI. While practically useful, it is more incremental, extending existing IAM frameworks rather than revealing fundamental insights about AI capabilities.

vs. Learning Admissible Heuristics via Cost Partitioning

claude-opus-4.66/5/2026

Paper 1 presents a genuinely novel contribution at the intersection of machine learning and AI planning—the first learned heuristic with guaranteed admissibility. This addresses a fundamental open problem by combining cost partitioning theory with deep learning in a principled way (Lagrangian duality, structural graph features, architectural guarantees). It has broad impact across planning, combinatorial optimization, and ML theory. Paper 2 addresses an important practical problem in agentic AI governance but is more incremental, extending existing authorization frameworks with new compositional primitives. While timely, it is narrower in scientific scope and more engineering-oriented.

vs. Toward Pre-Deployment Assurance for Enterprise AI Agents: Ontology-Grounded Simulation and Trust Certification

gemini-3.16/5/2026

Paper 1 introduces foundational, formally proven primitives for agentic AI authorization, addressing a critical gap in traditional IAM systems. By formalizing recursive delegation and dynamic scoping as compositional operators rather than static tokens, it provides a theoretical breakthrough in AI security. While Paper 2 offers a rigorous and practical enterprise testing framework, Paper 1's generalizable theoretical contributions and formal proofs are likely to have a broader and longer-lasting scientific impact on how autonomous AI systems are architected and governed across all domains.

vs. Handoff Debt: The Rediscovery Cost When Coding Agents Take Over Interrupted Tasks

gemini-3.16/3/2026

Paper 2 addresses a fundamental bottleneck in the deployment of autonomous AI: security and governance. By providing a formal, compositional authorization framework for agent delegation, its impact spans across all domains where agentic AI is deployed (e.g., enterprise, finance, healthcare). While Paper 1 introduces an interesting metric for coding agents, Paper 2's focus on formal proofs for AI safety and broad real-world applicability gives it a higher potential for widespread scientific and practical impact.

vs. TAPS: Target-Aware Prefix Tree Selection for Diffusion-Drafted Speculative Decoding

gemini-3.16/3/2026

Paper 2 addresses a critical, emerging bottleneck in AI safety and security by formalizing authorization for agentic AI. While Paper 1 offers strong algorithmic speedups for LLM inference, Paper 2 provides foundational governance primitives that cross-cut disciplines, enabling safe, real-world deployment of autonomous systems with broader long-term societal and scientific impact.

vs. Forget Attention: Importance-Aware Attention Is All You Need

gpt-5.26/3/2026

Paper 2 likely has higher scientific impact: it proposes a concrete, easily adoptable architectural modification (score-level SSM-attention fusion) with strong empirical gains and no custom kernels, making integration into existing Transformer stacks straightforward. Its timeliness is high given intense interest in efficient long-context and hybrid SSM/attention models, and improvements on widely used benchmarks suggest broad applicability across NLP and sequence modeling. Paper 1 addresses an important emerging governance need and claims formal rigor, but its impact may be narrower and more dependent on adoption by specific IAM/agent platforms.

vs. What Benchmarks Don't Measure: The Case for Evaluating Abstention Competence in Autonomous Agents

gpt-5.26/3/2026

Paper 2 is likely to have higher impact due to timeliness and breadth: abstention competence targets a widely observed failure mode in current autonomous agents and can directly reshape benchmark design, evaluation practice, and safety engineering across labs and products. It introduces a clear taxonomy and concrete, adoptable metrics/protocols, supported by multi-model empirical results on many enterprise scenarios—facilitating replication and immediate real-world use. Paper 1 is novel and potentially important for governance, but its impact may be narrower and slower-moving because it depends on integration into IAM/policy ecosystems and organizational adoption of new authorization semantics.

vs. Solipsistic Superintelligence is Unlikely to be Cooperative

gemini-3.16/3/2026

Paper 2 addresses a critical, immediate need in agentic AI (authorization and delegation) with strong methodological rigor, including formal proofs and empirical evaluation. While Paper 1 provides a valuable conceptual framework for future AI alignment, Paper 2 offers concrete, actionable technical solutions that can be integrated into current systems, giving it a clearer path to near-term, widespread impact in AI engineering and security.

vs. InfoMem: Training Long-Context Memory Agents with Answer-Conditioned Information Gain

gpt-5.26/3/2026

Paper 2 likely has higher near-term scientific impact: it introduces a concrete, broadly applicable RL reward (answer-conditioned information gain) for training long-context memory agents, an active and widely relevant area in LLM research. The method is directly testable, code is released, and improvements can be adopted across many long-context/agent pipelines, boosting reproducibility and uptake. Paper 1 is novel and timely for agent governance, but authorization frameworks tend to have slower adoption and narrower immediate empirical traction, with impact depending on integration into standards and real-world IAM ecosystems.

vs. Leveraging BART to Assess CS1 C++ Programming Assignments using Rubric-based Criteria

gemini-3.16/3/2026

Paper 2 addresses a critical, highly timely challenge in AI safety and cybersecurity: governing and authorizing autonomous AI agents. Its compositional framework and formal proofs offer foundational primitives for a rapidly growing field with broad cross-domain applications in enterprise and security. In contrast, Paper 1 presents a solid but more incremental and domain-specific application of existing NLP techniques to automated grading in computer science education.

vs. PyraMathBench: Evaluating and Improving Mathematical Capability in Large Language Models

gpt-5.26/3/2026

Paper 2 has higher estimated impact due to greater novelty and cross-domain relevance: it introduces a formal, compositional authorization/delegation framework tailored to agentic AI—an emerging, timely problem spanning security, distributed systems, and AI governance. Its focus on reusable primitives (recursive delegation, scope attenuation) and an overlay operator suggests broad applicability to real-world IAM deployments with strong methodological rigor via formal proofs plus empirical evaluation. Paper 1 is valuable but more incremental: a new benchmark and training/tool-use enhancements for LLM math, likely impactful within LLM evaluation but narrower in breadth and longer-term foundational influence.

vs. OctoT2I: A Self-Evolving Agentic Text-to-Image Router

claude-opus-4.66/3/2026

Paper 1 addresses a fundamental and broadly applicable challenge in AI governance—authorization and delegation for autonomous agents—which is critical as agentic AI proliferates across industries. Its compositional framework with formal proofs offers lasting theoretical contributions applicable to finance, healthcare, and any domain requiring accountable AI. Paper 2, while technically sound and practically useful for T2I routing, addresses a narrower problem with more incremental impact. The governance framework's breadth of impact across fields, timeliness given rapid agentic AI deployment, and foundational nature give it higher long-term scientific impact.

vs. SIRI: Self-Internalizing Reinforcement Learning with Intrinsic Skills for LLM Agent Training

claude-opus-4.66/3/2026

Paper 1 (SIRI) presents a novel, empirically validated framework for skill internalization in LLM agents with clear quantitative improvements over strong baselines. It addresses a practical and timely problem in LLM agent training with a complete methodology (discovery, validation, internalization) and demonstrates significant gains on established benchmarks. Paper 2 addresses an important governance/authorization problem for agentic AI but is more incremental in nature, extending existing IAM/OAuth concepts to agentic settings. While relevant, its impact is narrower and more domain-specific compared to SIRI's broader applicability to LLM agent training.

vs. CP-Agent: Context-Aware Multimodal Reasoning for Cellular Morphological Profiling under Chemical Perturbations

gpt-5.26/3/2026

Paper 1 is more likely to have higher broad scientific impact: it targets a foundational, cross-domain problem (authorization/delegation for agentic systems) with a compositional framework that can overlay onto existing IAM policies, suggesting wide applicability across AI deployments, cybersecurity, finance, and governance. Its emphasis on formal relational definitions, compositional operators, proofs, and empirical evaluation indicates higher methodological rigor and potential to become a standard reference as agentic AI adoption grows. Paper 2 is timely and impactful for drug discovery, but its scope is narrower to Cell Painting workflows and specific modeling pipelines.

vs. Traj-Evolve: A Self-Evolving Multi-Agent System for Patient Trajectory Modeling in Lung Cancer Early Detection

claude-opus-4.66/3/2026

Traj-Evolve addresses a concrete, high-impact clinical problem (lung cancer early detection) with a novel combination of experience-based memory, multi-agent reinforcement learning, and retrieval-augmented reasoning over longitudinal EHRs. Its methodological contributions—self-evolving agents, experience pools, and MARL-based optimization—are broadly applicable to other clinical trajectory modeling tasks. Paper 1 proposes a theoretical governance framework for agentic AI authorization, which is timely but more niche and infrastructural. Paper 2's empirical results on a real clinical task, novel architectural contributions, and direct potential to improve patient outcomes give it broader and more immediate scientific impact.

vs. Before the Model Learns the Bug:Fuzzing RLVR Verifiers

claude-opus-4.66/3/2026

Paper 1 addresses a critical and timely vulnerability in RLVR training pipelines—buggy verifiers that models can exploit during optimization. This is directly relevant to the rapidly growing field of LLM alignment via reinforcement learning (e.g., DeepSeek-R1, Qwen). The fuzzing framework is practical, immediately applicable, and addresses a fundamental trust issue in reward modeling. Paper 2 proposes a governance framework for agentic AI authorization, which is important but more incremental, extending existing IAM/OAuth concepts. Paper 1's novelty in identifying and systematically testing verifier bugs has broader near-term impact on AI safety and training reliability.

vs. RoleCDE:Benchmarking and Mitigating Role-Alignment Trade-offs in Role-Playing Agents

gpt-5.26/3/2026

Paper 2 has higher potential impact due to its broader real-world applicability and cross-field relevance: it targets a core bottleneck for deploying agentic AI safely in production—authorization, delegation, and accountability—spanning security, distributed systems, governance, and AI. Its contribution appears more foundational (compositional primitives + operator over existing policies) and claims formal proofs plus empirical evaluation, suggesting strong rigor and adoption potential. Paper 1 is timely and useful for LLM evaluation/alignment, but is narrower (benchmark + fine-tuning) and likely impacts primarily the role-playing/LLM alignment subcommunity.