Stable Agentic Control: Tool-Mediated LLM Architecture for Autonomous Cyber Defense

Kerri Prinos, Lilianne Brush, Cameron Denton, Zhanqi Wang, Joshua Knox, Snehal Antani, Anton Foltz, Amy Villaseñor

May 4, 2026

arXiv:2605.03034v1 PDF

cs.AI(primary)cs.CReess.SY

#163of 2292·Artificial Intelligence

#163 of 2292 · Artificial Intelligence

Tournament Score

1526±43

10501800

81%

Win Rate

Wins

Losses

Matches

Rating

7.2/ 10

Significance7.5

Rigor7

Novelty7.8

Clarity7.5

Tournament Score

1526±43

10501800

81%

Win Rate

Wins

Losses

Matches

Rating

7.2/ 10

Significance

Rigor

Novelty

Clarity

Abstract

Agentic systems involved in high-stake decision-making under adversarial pressure need formal guarantees not offered by existing approaches. Motivated by the operational needs of security operations centers (SOCs) that must configure endpoint detection and response (EDR) policies under adversarial pressure, we present a tool-mediated architecture: LLM agents use deterministic tools (Stackelberg best-response, Bayesian observer updates, attack-graph primitives) and select from finite action catalogs enforced at the tool-output interface. A composite Lyapunov function machine-checked in Lean 4 with zero sorry certifies controllability, observability from asymmetric sensor data, and Input-to-State Stability (ISS) robustness under intelligent adversarial disturbance, with two corollaries extending the certificate to any controller or adversary from the catalogs. On 282 real enterprise attack graphs, the claims hold with margin. On paired offensive/defensive telemetry, a tool-mediated Claude Sonnet 4 controller reduces the attacker's expected payoff (game value) by 59% relative to a deterministic greedy baseline, with zero variance across 40 runs at four temperatures. A Claude Haiku 4.5 controller converges to suboptimal game values but stays catalog-bounded over an additional 40 runs, demonstrating that architectural stability is not dependent on the controller capability. The LLM agent's non-determinism furthers creative exploration of strategies, while the tool-mediated architecture ensures system stability.

AI Impact Assessments

(1 models)

Scientific Impact Assessment

1. Core Contribution

This paper introduces a tool-mediated architecture for LLM-in-the-loop adversarial control in autonomous cyber defense, where formal stability guarantees are properties of the architectural loop rather than the agent itself. The key insight is that by constraining LLM agents to compose deterministic tools (Stackelberg best-response solvers, Bayesian observers, attack-graph primitives) and enforcing finite action catalogs at the tool-output interface, one can prove controllability, observability, and Input-to-State Stability (ISS) regardless of the LLM's non-determinism. The composite Lyapunov function V(k) = S(k) + λθ(k) is machine-checked in Lean 4 with zero `sorry`, combining game value and belief uncertainty into a single certificate.

The paper addresses a genuine gap: existing work on LLM safety focuses on output filtering, guardrails, or behavioral constraints, while this work provides system-level closed-loop stability guarantees. The framing of "stability as architectural discipline" — constraining the environment rather than the agent — is a conceptually clean and practically important contribution.

2. Methodological Rigor

Formal verification. The Lean 4 mechanization (~300 lines, zero sorry) is a genuine strength, though the scope is carefully delimited. Table A1 transparently separates what is proved in Lean (Lyapunov non-negativity, Kalman contraction, the three main claims) from what is assumed from cited results (monotonicity of network interdiction objective, class-K properties of αB, double oracle termination). The double-oracle termination argument remains informal — a notable gap given its importance to Claim (iii). The formalization instantiates Hayakawa et al.'s framework on attack graphs; the extension with anticipatory defense is incremental but sound.

Experimental design. Experiment 1 is well-powered: 282 real enterprise graphs from 161 organizations across 25 industries, with appropriate filtering criteria (18 degenerate graphs excluded with transparent justification). Statistical reporting is thorough — Wilson CIs, bootstrap CIs, paired Wilcoxon tests, Hodges-Lehmann effect sizes, Benjamini-Hochberg FDR correction. The 60% synthetic sensor coverage is a limitation acknowledged by the authors.

Experiment 2 is substantially more limited: a single GOAD graph (23 nodes, 44 edges) with 80 LLM runs. The zero-variance result for Sonnet 4 (all 40 runs converging to exactly S=0.3427) is striking and supports the architectural stability claim, but generalizability from one graph is inherently limited. The Haiku 4.5 comparison is valuable — showing that a weaker model stays catalog-bounded while converging to suboptimal values cleanly separates architectural guarantees from decision quality.

Potential concerns. The assumptions deserve scrutiny. A4 (persistent deployment — no rollback) is restrictive for real SOC operations. The finite catalog assumption (A2) is enforced architecturally but limits expressiveness. The adversary model assumes finite technique sets (A3), which bounds the ISS analysis but may not capture truly novel zero-day attacks. The paper acknowledges these limitations.

3. Potential Impact

Practical impact for SOCs. The architecture addresses real operational pain: EDR policy configuration under adversarial pressure with limited maintenance windows. The V(k) trajectory as an operational diagnostic (when to stop, what to invest in) has direct utility. The 59% game-value reduction over greedy baseline demonstrates practical value beyond formal guarantees.

Architectural pattern. The broader contribution — "wherever agentic systems act under adversarial pressure with a finite action catalog, stability becomes a formal property of the loop" — could influence adjacent domains (autonomous driving, financial trading, supply chain management). The decoupling of exploration from stability (action-level Jaccard 0.74–0.93 with zero outcome-level variance) is an elegant demonstration.

Dual-use concern. The authors note that Corollary 2 proves the adversary's certificate identically, making this genuinely dual-use. This is ethically relevant but transparently disclosed.

4. Timeliness & Relevance

The paper is extremely timely. The 2026 CrowdStrike report's 89% increase in AI-enabled attacks, combined with references to real AI-orchestrated attacks (Anthropic's disclosure, SentinelOne's detection), grounds the motivation. The gap between RL approaches requiring O(10³) training episodes and the need for single-cycle convergence is real and pressing. Eslami and Yu (2026) explicitly identified stability verification for LLM-based agentic systems as an open problem; this paper claims to address it.

5. Strengths & Limitations

Key strengths:

First mechanically-verified closed-loop stability certificate for a tool-mediated LLM controller

Clean separation of architectural safety from agent capability (Sonnet vs. Haiku comparison)

Counterintuitive finding that adversarial pressure improves observability (4.7× lower belief-truth gap)

Thorough statistical methodology with appropriate corrections

Transparent delineation of what is formally proved vs. assumed

Reproducible cost ($93.90 for 80 runs)

Notable weaknesses:

Experiment 2 uses only one graph with one LLM family — the generalization claim is undersupported

Raw pentest data cannot be released, limiting full reproducibility

Assumption A4 (no rollback) is operationally restrictive

Double-oracle termination is informal — a gap in the formal chain

The 300-line Lean formalization, while clean, encodes relatively simple mathematical structures; the hard work is in the modeling assumptions

The "zero variance" result may partially reflect that the catalog+budget combination admits few optimal solutions rather than demonstrating deep architectural properties

The greedy baseline is deliberately weak (myopic marginal ranking); comparison against stronger baselines (e.g., MILP, Monte Carlo tree search) would strengthen the contribution

Additional observations: The paper is from Horizon3.ai (a commercial pentesting company), which provides access to real attack graphs but also introduces potential conflicts of interest in framing. The use of their proprietary NodeZero platform as both data source and evaluation environment should be noted. The paper's length and appendix detail are commendable for reproducibility but the core narrative could be tighter.

Summary

This is a well-executed paper that makes a genuine contribution at the intersection of formal methods, game theory, and LLM-based autonomous systems. The architectural insight — stability as a loop property, not an agent property — is powerful and transferable. The formal verification, while bounded in scope, sets a new standard for LLM-in-the-loop systems. The empirical validation on real enterprise data is credible despite the limitations of Experiment 2. The primary risk to impact is the narrow empirical evaluation of the LLM controller (one graph, one model family) and the restrictive assumptions required for the formal guarantees.

Rating:7.2/ 10

Significance 7.5Rigor 7Novelty 7.8Clarity 7.5

Generated May 6, 2026

Comparison History (27)

vs. Introspection Adapters: Training LLMs to Report Their Learned Behaviors

gemini-35/6/2026

Paper 2 introduces a groundbreaking integration of LLM agents with formal verification (Lean 4) and control theory, offering mathematical guarantees for AI in high-stakes environments. While Paper 1 provides a clever empirical approach to AI safety, Paper 2's exceptional methodological rigor and neuro-symbolic architecture represent a significant paradigm shift for reliable, autonomous AI agents, likely sparking broad interdisciplinary research across AI, cybersecurity, and formal methods.

vs. Introspection Adapters: Training LLMs to Report Their Learned Behaviors

gemini-35/6/2026

Paper 2 bridges LLM agents, formal verification (Lean 4), and game theory to provide mathematically guaranteed stability in high-stakes adversarial environments. Its exceptional methodological rigor—using machine-checked proofs to bound LLM non-determinism—solves a major bottleneck in deploying autonomous AI safely. While Paper 1 offers a highly relevant approach to LLM auditing, Paper 2's formal guarantees and cross-disciplinary innovation give it a higher potential for foundational scientific impact.

vs. ReasonAudio: A Benchmark for Evaluating Reasoning Beyond Matching in Text-Audio Retrieval

claude-opus-4.65/6/2026

Paper 1 presents a novel architecture combining formal verification (Lean 4 proofs) with LLM agents for autonomous cyber defense, offering provable stability guarantees (ISS, Lyapunov) in adversarial settings. This bridges formal methods and agentic AI in a high-stakes domain with immediate real-world applications. The methodology is rigorous with machine-checked proofs and empirical validation on 282 real attack graphs. Paper 2 introduces a useful benchmark for audio retrieval reasoning but is incremental—it identifies model weaknesses without proposing solutions. Paper 1's cross-disciplinary contributions (formal verification, AI safety, cybersecurity) give it broader and deeper impact potential.

vs. GeoDecider: A Coarse-to-Fine Agentic Workflow for Explainable Lithology Classification

gemini-35/6/2026

Paper 2 addresses a critical gap in LLM agent deployment by introducing formally verified stability guarantees using Lean 4 and control theory. This rigorously tackles the unreliability of LLMs in high-stakes environments like cyber defense, offering profound implications for AI safety and autonomous systems. In contrast, Paper 1 presents a useful but domain-specific applied workflow for geology, lacking the cross-disciplinary theoretical innovation, methodological rigor, and broad systemic impact of Paper 2.

vs. Perturbation Dose Responses in Recursive LLM Loops: Raw Switching, Stochastic Floors, and Persistent Escape under Append, Replace, and Dialog Updates

claude-opus-4.65/6/2026

Paper 1 presents a novel, formally verified architecture for autonomous cyber defense with clear real-world applications in security operations centers. It combines LLM agents with deterministic tools, provides machine-checked stability guarantees (Lean 4 proofs), and demonstrates significant empirical improvements (59% attacker payoff reduction) on real enterprise data. The work bridges formal methods, game theory, and AI safety with immediate practical relevance. Paper 2 offers interesting empirical characterization of recursive LLM loop dynamics but is more narrowly focused on phenomenological observations with less clear practical applicability and limited theoretical contribution.

vs. Perturbation Dose Responses in Recursive LLM Loops: Raw Switching, Stochastic Floors, and Persistent Escape under Append, Replace, and Dialog Updates

claude-opus-4.65/6/2026

Paper 2 addresses a critical real-world problem (autonomous cyber defense) with a rigorous methodology combining formal verification (Lean 4 proofs), game-theoretic tools, and empirical validation on real enterprise attack graphs. It offers a novel architecture with provable stability guarantees (ISS via Lyapunov functions) for LLM-based agentic systems—a timely concern as LLM agents are deployed in high-stakes domains. Paper 1, while methodologically thorough in characterizing recursive LLM loop dynamics, addresses a more niche phenomenon with less immediate practical applicability and narrower cross-field impact.

vs. GeoDecider: A Coarse-to-Fine Agentic Workflow for Explainable Lithology Classification

gemini-35/6/2026

Paper 1 offers significantly higher scientific impact by integrating formal verification (Lean 4) with LLM agent architectures, providing machine-checked stability guarantees for high-stakes decision-making. This addresses a major bottleneck in deploying autonomous agents in critical systems. While Paper 2 presents a solid applied workflow for geology, Paper 1's methodological rigor, mathematical formalization of control and stability, and robust validation in cyber defense represent a foundational advancement with broader potential applicability across AI safety, control theory, and adversarial security.

vs. From Intent to Execution: Composing Agentic Workflows with Agent Recommendation

claude-opus-4.65/6/2026

Paper 1 presents a fundamentally novel contribution by combining formal verification (Lean 4 machine-checked proofs) with LLM-based autonomous cyber defense, providing provable stability guarantees (ISS, Lyapunov) for agentic AI in adversarial settings. This addresses a critical gap—formal safety guarantees for LLM agents in high-stakes domains—with rigorous methodology and strong empirical results (59% payoff reduction, zero variance). Paper 2 offers incremental improvements to multi-agent orchestration via IR-based agent recommendation, but lacks comparable theoretical depth, formal guarantees, or novelty. Paper 1's cross-disciplinary impact (formal methods, cybersecurity, AI safety) is substantially broader.

vs. From Intent to Execution: Composing Agentic Workflows with Agent Recommendation

claude-opus-4.65/6/2026

Paper 1 presents a significantly more novel and rigorous contribution: a formally verified (Lean 4 machine-checked) architecture providing stability guarantees for LLM agents in adversarial cyber defense, combining Lyapunov stability theory with tool-mediated LLM control. It addresses a critical gap—formal safety guarantees for agentic AI in high-stakes domains—with strong empirical validation on real enterprise data. Paper 2 addresses useful but more incremental work on automating multi-agent system composition using standard IR techniques. Paper 1's cross-disciplinary novelty (formal methods + LLM agents + cybersecurity) and methodological rigor give it substantially higher impact potential.

vs. Coherent Hierarchical Multi-Label Learning to Defer for Medical Imaging

claude-opus-4.65/6/2026

Paper 1 addresses a novel, well-formalized problem (hierarchical multi-label learning to defer) with rigorous theoretical contributions including Bayes-optimal characterization and provably coherent solutions. It opens a new research direction at the intersection of structured prediction and human-AI collaboration in medical imaging—a high-impact domain. Paper 2 presents an interesting tool-mediated architecture for cyber defense with formal guarantees, but its contributions are more application-specific and incremental (combining existing tools like Stackelberg games, Lyapunov analysis, and LLM agents). Paper 1's theoretical depth and broader applicability to hierarchical decision-making give it higher potential impact.

vs. Coherent Hierarchical Multi-Label Learning to Defer for Medical Imaging

claude-opus-4.65/6/2026

Paper 1 presents a novel architecture combining LLMs with formally verified (Lean 4) stability guarantees for autonomous cyber defense—a highly timely problem given increasing AI autonomy in security-critical domains. Its contributions span formal methods, game theory, and agentic AI, with strong empirical validation on real enterprise data. The formal stability guarantees for LLM-based agents are pioneering. Paper 2 makes a solid theoretical contribution to learning-to-defer with hierarchical labels, but addresses a narrower problem with more incremental advances. Paper 1's breadth of impact across AI safety, formal verification, and cybersecurity gives it higher potential impact.

vs. FinSTaR: Towards Financial Reasoning with Time Series Reasoning Models

claude-opus-4.65/6/2026

Paper 1 presents a novel architecture combining formal verification (Lean 4 proofs) with LLM agents for autonomous cyber defense, offering provable stability guarantees (ISS, Lyapunov) for agentic AI in adversarial settings. This addresses a critical gap—formal safety guarantees for LLM-based autonomous systems—with broad implications beyond cybersecurity. The machine-checked proofs, architectural separation of LLM creativity from deterministic tool guarantees, and demonstration that stability is controller-independent represent foundational contributions. Paper 2, while solid, is more incremental: a domain-specific benchmark and CoT strategies for financial time series, building on existing TSRM paradigms with narrower cross-field impact.

vs. Automated Large-scale CVRP Solver Design via LLM-assisted Flexible MCTS

gpt-5.25/6/2026

Paper 2 likely has higher impact: it tackles a high-stakes, widely relevant domain (autonomous cyber defense) and contributes a tool-mediated LLM architecture with formal, machine-checked stability/robustness guarantees (Lean 4, Lyapunov/ISS), which is methodologically rigorous and broadly influential for safe agentic systems beyond security. It is timely given concerns about LLM autonomy and reliability, and shows empirical validation on real enterprise attack graphs with strong, low-variance performance gains. Paper 1 is novel for LLM-assisted solver synthesis, but its primary impact is narrower (CVRP optimization) and offers fewer formal guarantees.

vs. QED: An Open-Source Multi-Agent System for Generating Mathematical Proofs on Open Problems

gemini-35/6/2026

While Paper 2 offers exceptional methodological rigor with formal guarantees for cyber defense, Paper 1 demonstrates a landmark achievement in artificial intelligence: generating original, expert-verified proofs for open mathematical problems. By identifying specific LLM failure modes and systematically overcoming them to solve previously unsolved PDEs, Paper 1 represents a fundamental leap in AI's capability to contribute directly to scientific discovery. Its open-source nature and success in advancing human knowledge across mathematics give it a broader and more profound fundamental scientific impact.

vs. Programmatic Context Augmentation for LLM-based Symbolic Regression

gpt-5.25/6/2026

Paper 2 has higher impact potential due to its combination of formal, machine-checked stability guarantees (Lean 4, Lyapunov/ISS), a constrained tool-mediated architecture addressing safety in high-stakes adversarial settings, and strong empirical validation on real enterprise attack graphs with large payoff reductions and robustness across temperatures. Its contributions span LLM agents, control theory, formal methods, and cybersecurity, making it timely and broadly influential. Paper 1 is novel and useful for scientific discovery workflows, but its methodological rigor and cross-field impact are less transformative than certified stability for autonomous cyber defense.

vs. Making the Invisible Visible: Understanding the Mismatch Between Organizational Goals and Worker Experiences in AI Adoption

gpt-5.25/6/2026

Paper 1 is likely higher impact due to strong novelty (tool-mediated LLM control with formal, machine-checked Lyapunov/ISS guarantees), methodological rigor (Lean 4 proof, adversarial game framing, empirical validation on 282 enterprise attack graphs plus repeated-run stability), and timely real-world applicability in autonomous cyber defense. Its ideas could transfer to broader safety-critical agentic systems requiring verifiable robustness. Paper 2 addresses an important, timely socio-technical problem with cross-domain relevance, but its qualitative interview-based contribution and proposed strategies, while valuable, typically yield less technical step-change impact than formally verified, deployable architectures.

vs. Geometry over Density: Few-Shot Cross-Domain OOD Detection

claude-opus-4.65/6/2026

Paper 2 introduces a genuinely novel paradigm ('train-once, deploy-anywhere') for OOD detection using information-geometric analysis of diffusion trajectories, achieving ~500× sample efficiency improvement. Its theoretical insight connecting diffusion score functions to discrete Sobolev norms is elegant and broadly applicable across domains. Paper 1, while rigorous with its Lean 4 proofs and practical for cyber defense, addresses a narrower application domain and combines existing techniques (Stackelberg games, Lyapunov stability, LLM agents) rather than introducing fundamentally new concepts. Paper 2's cross-domain generalization framework has broader impact potential across ML safety and deployment.

vs. Stop Automating Peer Review Without Rigorous Evaluation

gpt-5.25/6/2026

Paper 1 presents a novel, rigorously formalized tool-mediated LLM control architecture with machine-checked stability/robustness guarantees and demonstrated performance on real enterprise attack graphs, enabling practical autonomous cyber defense. Its methodological rigor (Lean-verified Lyapunov/ISS properties), clear operational application, and cross-field relevance (control theory, formal methods, cybersecurity, agentic AI) suggest broad, high impact. Paper 2 is timely and important for policy/ML evaluation, but as a position paper its contributions are narrower and less technically enabling than a verified architecture with deployable implications.

vs. Geometry over Density: Few-Shot Cross-Domain OOD Detection

claude-opus-4.65/6/2026

Paper 1 introduces a novel paradigm (train-once, deploy-anywhere) for OOD detection using information-geometric analysis of diffusion trajectories, achieving ~500x sample efficiency improvement. It has broader impact across multiple ML domains and applications, offers a theoretically grounded framework connecting diffusion models to Sobolev norms, and demonstrates strong empirical results across 12 benchmarks. Paper 2 addresses an important but narrower cybersecurity niche, combining LLMs with formal verification for autonomous defense. While rigorous, its impact is more domain-specific. Paper 1's methodological contributions to foundational ML problems give it wider applicability and higher potential citation impact.

vs. Making the Invisible Visible: Understanding the Mismatch Between Organizational Goals and Worker Experiences in AI Adoption

gpt-5.25/6/2026

Paper 1 is more novel and technically innovative, combining tool-mediated LLM control with formal, machine-checked stability/robustness guarantees (Lean 4) for autonomous cyber defense—an area with immediate high-stakes applications. Its methodological rigor (Lyapunov/ISS certificates, finite action catalogs, evaluations on real attack graphs and telemetry) supports credible, generalizable impact across AI safety, control, formal methods, and security. Paper 2 addresses an important, timely socio-technical problem, but its qualitative insights and recommendations are less technically novel and typically yield narrower downstream scientific leverage than a formally verified architecture with measurable performance gains.