Position: Safety and Fairness in Agentic AI Depend on Interaction Topology, Not on Model Scale or Alignment
Tanav Singh Bajaj, Nikhil Singh, Karan Anand, Eishkaran Singh
Abstract
As large language models are increasingly deployed as interacting agents in high-stakes decisions, the AI safety community assumes that safety properties of individual models will compose into safe multi-agent behavior. This position paper argues that this assumption is fundamentally mistaken. In agentic AI, safety is determined by interaction topology, not model weights. When agents deliberate sequentially or aggregate via parallel voting with a judge, the structure of information flow and decision coupling dominates outcomes. Evidence across model families and scales reveals three persistent topology-driven pathologies: ordering instability, where system behavior depends primarily on agent sequence; information cascades, where early judgments propagate regardless of correctness; and functional collapse, where systems satisfy fairness metrics while abandoning meaningful risk discrimination. Contrary to intuition, scaling to more capable models strengthens these effects by increasing consensus formation and reducing the challenge of initial decisions. These failure modes are invisible to model-centric evaluation and alignment procedures. We argue that agentic AI must be treated as a dynamical system rather than a collection of aligned components. Interaction topology must become a primary target of safety evaluation and regulation, with systems required to demonstrate robustness across architectural variations before deployment.
AI Impact Assessments
(1 models)Scientific Impact Assessment
Core Contribution
This position paper articulates a specific thesis: in multi-agent LLM systems, safety and fairness properties are determined primarily by interaction topology (ordering, communication structure, aggregation rules) rather than by individual model alignment or scale. The paper identifies three topology-driven pathologies—ordering instability, information cascades, and functional collapse—and provides empirical evidence from a synthetic credit approval task across multiple model families (LLaMA 3B/70B, Mistral 7B, Qwen 72B) in both sequential and parallel architectures.
The conceptual reframing—treating multi-agent LLM systems as dynamical systems with topology as a control parameter rather than collections of independently safe components—is the paper's most valuable contribution. The "compositionality myth" framing is sharp and well-articulated: the assumption that individually aligned agents compose into safe systems is analogous to assuming that stable individual components guarantee stable systems, a notion long refuted in control theory, ecology, and social choice.
Methodological Rigor
The experimental design has notable strengths in its controlled ablation approach: holding agents, prompts, task, and model fixed while varying only topology (all 24 orderings for sequential; parallel with judge). This clean isolation of the causal variable is commendable.
However, several methodological concerns limit confidence in the generalizability of claims:
1. Single domain, synthetic data: All experiments use a synthetic credit approval task with 5,760 applications generated from a Cartesian product grid. While this controls confounds, it raises questions about ecological validity. Credit decisions have particular properties (binary outcomes, clear risk gradients) that may amplify or attenuate topological effects differently than other domains.
2. Limited topology space explored: Only two topologies are studied—sequential chains and parallel-with-judge. The paper's sweeping claims about "interaction topology" as the primary safety determinant rest on a binary comparison. More complex topologies (hierarchical, partially connected, iterative debate with feedback) are not explored.
3. Temperature and stochasticity: Experiments use T=0.7, introducing sampling variance. The paper doesn't report confidence intervals, multiple seeds, or distinguish topology-driven variance from stochastic sampling variance. This is a significant gap for a paper claiming topology is the *primary* determinant—one needs to quantify it against other sources of variance.
4. Scale comparison confounds: Comparing 3B to 70B models conflates many differences (training data, architecture, RLHF details) with "scale." The claim that "scale amplifies cascades" is suggestive but not rigorously established—these are different models, not controlled scale interventions.
5. The "functional collapse" finding for 3B parallel may partly reflect that 3B models are simply poor judges when aggregating conflicting signals, a capability limitation rather than a deep topological insight.
Potential Impact
The paper addresses an increasingly relevant deployment pattern—multi-agent LLM systems in high-stakes domains. Its practical recommendations (topological sweeps in evaluation, architecture stress-testing, topological disclosure in governance) are actionable and timely.
Positive impact vectors:
Limitations on impact:
Timeliness & Relevance
The paper is well-timed. Multi-agent LLM deployments are proliferating rapidly, and there is a genuine regulatory gap around interaction-level safety. The EU AI Act and emerging frameworks focus almost exclusively on individual models. The observation that current governance targets the wrong object is important and timely.
However, several concurrent works (Wang et al., 2025 on agreement bias; Bisconti et al., 2025 on multi-agent risks; Śliwiński et al., 2024 on communication topology in debate) address overlapping concerns, reducing the novelty of the position.
Strengths
Limitations
Overall Assessment
This is a well-written position paper that raises a genuinely important issue for the AI safety community. Its core observation—that interaction structure can dominate component properties in multi-agent systems—is sound and under-appreciated. However, the empirical evidence, while suggestive, is too narrow to fully support the strong claims made. The paper would benefit from formal analysis, broader domain coverage, proposed mitigations, and more careful variance decomposition. It serves as a useful provocation and agenda-setting piece rather than a definitive scientific contribution.
Generated May 5, 2026
Comparison History (34)
Paper 1 introduces a fundamentally novel perspective on AI safety—that interaction topology, not model alignment, determines multi-agent system safety. This challenges core assumptions in the AI safety community and has broad implications for regulation, evaluation, and system design. It identifies three specific failure modes (ordering instability, information cascades, functional collapse) that reframe how we think about scaling and alignment. Paper 2 addresses a practical but narrower engineering problem (model migration) with incremental methodological contributions. Paper 1's breadth of impact, timeliness given the rise of agentic AI, and paradigm-shifting argument give it substantially higher scientific impact potential.
Paper 2 addresses a fundamental and timely gap in AI safety research—that multi-agent system safety depends on interaction topology rather than individual model alignment. This conceptual reframing has broad implications across AI safety, regulation, and multi-agent system design, potentially influencing policy and evaluation standards. Paper 1, while practically useful, presents a relatively incremental engineering contribution (a Bayesian framework for model migration) with narrower scope limited to enterprise LLM operations. Paper 2's novel theoretical insight about topology-driven failure modes (ordering instability, information cascades, functional collapse) challenges core assumptions in the field and is likely to generate significant discussion and follow-up research.
Paper 2 presents a more paradigm-shifting insight—that multi-agent AI safety is governed by interaction topology rather than individual model properties—which challenges fundamental assumptions in AI safety. This reframing has broader implications across AI safety, regulation, and multi-agent systems design. While Paper 1 offers a solid technical contribution (ISOPro framework) with practical validation, its impact is more incremental, addressing known RLHF evaluation gaps. Paper 2's identification of topology-driven pathologies (ordering instability, information cascades, functional collapse) that worsen with scale is counterintuitive and likely to catalyze new research directions.
Paper 2 introduces a more paradigm-shifting conceptual framework—that multi-agent AI safety is governed by interaction topology rather than individual model properties. This challenges fundamental assumptions in AI safety and has broader implications across multi-agent systems, regulation, and deployment practices. Its insight that scaling worsens topology-driven pathologies is counterintuitive and impactful. Paper 1, while technically solid with ISOPro, addresses a narrower problem (evaluation frameworks and reward hacking) with validation limited to a constrained scheduling domain. Paper 2's breadth of impact across safety, fairness, and regulation gives it higher potential influence.
Paper 1 challenges a foundational assumption in AI safety, proposing a paradigm shift from model-centric alignment to evaluating interaction topology in multi-agent systems. This offers broad, systemic implications across AI development, safety evaluations, and regulation. In contrast, Paper 2 presents a practical but narrower, incremental application of LLMs to legal auto-formalization (GDPR) with a human-in-the-loop framework, limiting its overall scientific and theoretical impact compared to Paper 1.
Paper 2 fundamentally challenges prevailing assumptions in AI safety by arguing that interaction topology, rather than individual model alignment, dictates multi-agent system behavior. This paradigm shift has broad, field-wide implications for how agentic AI is evaluated and regulated. In contrast, Paper 1 offers a valuable but more narrowly focused application of LLMs to legal text formalization.
Paper 1 offers a concrete, theoretically grounded method (LAPD) for AI-generated text detection with provable guarantees and strong empirical results (45.82% improvement over baselines). It addresses an urgent practical problem with a novel theoretical insight connecting alignment processes to detection. Paper 2, while thought-provoking in arguing that interaction topology matters more than model alignment for multi-agent safety, is a position paper without new methods or formal frameworks—its claims, though important, are harder to validate and build upon. Paper 1's combination of theoretical rigor, practical utility, and measurable improvements gives it broader and more immediate scientific impact.
Paper 2 introduces a concrete, large-scale benchmark (DESPITE) with 12,279 tasks and evaluates 23 models, providing rigorous empirical evidence for a critical finding: planning ability scales with model size but safety awareness does not. This actionable, reproducible contribution directly impacts robotics and embodied AI deployment. Paper 1, while raising important theoretical points about interaction topology, is a position paper with less empirical grounding. Paper 2's benchmark will likely drive follow-up research and standardize safety evaluation for LLM-based planners, giving it broader and more immediate scientific impact.
Paper 1 challenges a fundamental assumption in the highly relevant and rapidly growing field of AI safety, proposing a paradigm shift towards evaluating interaction topology in multi-agent systems. Its findings have broad implications across AI, complex systems, and regulation. In contrast, Paper 2 offers an incremental improvement (0.28%) on a specific optimization problem using standard machine learning techniques, limiting its impact primarily to a niche subfield of operations research.
Paper 2 offers a concrete, theoretically grounded, and empirically validated solution to the critical real-world problem of AI-generated text detection. While Paper 1 presents a thought-provoking position on agentic AI safety, Paper 2 provides rigorous statistical guarantees and a massive 45.82% performance improvement over baselines. Its immediate practical applicability and strong methodological rigor give it a higher potential for direct, measurable scientific and societal impact.
Paper 2 challenges fundamental assumptions in AI alignment by shifting focus from individual model safety to interaction topology in multi-agent systems. This paradigm-shifting perspective has broader implications across all agentic AI applications, whereas Paper 1 focuses more specifically on embodied robotics. By identifying systemic pathologies that render model-centric alignment insufficient, Paper 2 is likely to spark widespread theoretical and architectural changes across the broader AI safety community.
Paper 1 likely has higher scientific impact due to its broader, timely relevance to AI safety and governance of multi-agent LLM systems. It reframes safety/fairness as emergent properties of interaction topology, highlighting topology-driven failure modes (ordering instability, cascades, functional collapse) that current model-centric alignment/evaluation may miss—an arguably novel, field-shaping perspective with cross-disciplinary implications (multi-agent systems, dynamical systems, regulation). Paper 2 is methodologically solid and practically useful for a specific optimization problem, but its gains are incremental and domain-narrow.
Paper 2 introduces a practical, generalizable protocol (epistemic blinding) that addresses a fundamental and widely overlooked problem—prior contamination in LLM outputs—with concrete demonstrations across biology and finance. It provides an open-source tool enabling immediate adoption, has clear real-world applications in drug discovery and financial analysis, and addresses a problem that grows more urgent as LLM-assisted analysis proliferates. Paper 1 raises important conceptual points about multi-agent topology but is more of a position/framing paper without novel methodological contributions or tools, limiting its direct actionable impact.
Paper 2 introduces a practical, generalizable protocol (epistemic blinding) that addresses a fundamental and previously underappreciated problem—prior contamination in LLM outputs. It provides concrete demonstrations across multiple domains (oncology, finance), releases open-source tools for adoption, and offers an immediately actionable methodology. While Paper 1 raises important conceptual points about multi-agent topology, it is a position paper without novel solutions. Paper 2's combination of a clearly defined problem, a practical protocol, cross-domain validation, and open-source tooling gives it higher potential for broad adoption and real-world impact.
Paper 2 addresses a fundamental and timely issue in AI safety for multi-agent LLM systems, arguing that interaction topology—not model alignment—determines safety outcomes. This challenges a core assumption in the field and has broad implications for regulation, evaluation, and deployment of agentic AI systems. Its findings (ordering instability, information cascades, functional collapse) are broadly applicable across domains. While Paper 1 makes a solid contribution to neuro-symbolic RL with autonomous concept grounding, its scope is narrower (Atari games) and the impact is more incremental within an established subfield.
Paper 1 addresses a fundamental and timely gap in AI safety for multi-agent systems, challenging a core assumption held by the safety community. Its identification of topology-driven pathologies (ordering instability, information cascades, functional collapse) has broad implications for regulation, evaluation, and deployment of agentic AI systems. The finding that scaling amplifies rather than mitigates these issues is counterintuitive and highly relevant. Paper 2, while technically solid in combining NeSy-RL with LLM-based concept grounding, addresses a narrower problem with less immediate broad impact. Paper 1's policy and safety implications give it wider cross-field relevance.
Paper 2 challenges fundamental assumptions in AI safety, arguing that interaction topology overrides individual model alignment in multi-agent systems. This represents a significant paradigm shift with broad implications across all domains deploying agentic AI. While Paper 1 provides a valuable domain-specific application for climate science, Paper 2's theoretical reframing of AI safety affects the foundational development and regulation of multi-agent systems globally, giving it a broader and potentially more transformative scientific impact.
Paper 1 has higher likely scientific impact due to its broader, more novel claim: that safety/fairness failures in multi-agent LLM systems are primarily topology-driven and can worsen with scale. This reframes evaluation and alignment from model-centric to system/dynamical-systems considerations, potentially affecting many agent architectures and deployment patterns across fields (AI safety, HCI, governance, distributed systems). Paper 2 is more application-focused and potentially valuable for enterprise compliance, but its impact is narrower and depends on strong assumptions (convex projections, bounded convergence) that may limit generality.
Paper 1 proposes a significant paradigm shift in AI safety, arguing that system-level interaction topology, rather than individual model alignment, dictates the safety of agentic AI. This has profound implications for multi-agent systems, AI regulation, and evaluation frameworks. Paper 2, while methodologically rigorous and providing valuable empirical data on LLM recursive loops, focuses on a much narrower, specific technical problem. Paper 1's broader applicability to high-stakes decision-making, fairness, and the rapidly growing field of agentic AI gives it a substantially higher potential for widespread scientific and real-world impact.
Paper 1 challenges a fundamental assumption in AI safety, proposing a paradigm shift from model-centric alignment to interaction topology in multi-agent systems. This conceptual novelty has the potential to broadly redirect future research and regulation across the field. While Paper 2 provides a rigorous and highly useful benchmark, its impact is more methodological and incremental, whereas Paper 1 offers a foundational theoretical reframing with lasting, cross-disciplinary implications.