Solipsistic Superintelligence is Unlikely to be Cooperative

Rakshit S Trivedi, Natasha Jaques, Logan Cross, Alexander Sasha Vezhnevets, Joel Z Leibo

Jun 2, 2026

arXiv:2606.03237v1 PDF

cs.AI(primary)cs.CLcs.CYcs.LGcs.MA

#2175of 3404·Artificial Intelligence

#2175 of 3404 · Artificial Intelligence

Tournament Score

1369±43

10501800

45%

Win Rate

Wins

Losses

Matches

Rating

6.5/ 10

Significance7.5

Rigor5.5

Novelty6

Clarity8.5

Tournament Score

1369±43

10501800

45%

Win Rate

Wins

Losses

Matches

Rating

6.5/ 10

Significance

Rigor

Novelty

Clarity

Abstract

AI's central challenge is shifting from capability to coexistence. The dominant paradigm in AI research focuses on developing powerful agents that treat the world as an exogenous and stationary source of feedback. We contend that superintelligence, an extremely capable task solver, born out of such a solipsistic approach to AI design, is unlikely to be cooperative. Deploying AI systems induces endogenous non-stationarity, resulting in a train-test-deploy gap where historical distributions diverge from the deployment context. We refer to this as the self-undermining property of unilateral optimization. Closing this gap requires AI that participates in cooperation: the equilibrium-selection process through which multiple actors navigate their interdependence. We call for a non-solipsistic research paradigm that treats this interdependence as a core design principle rather than approaching cooperation as a task to solve. This entails building dynamic evaluation testbeds involving adaptive counterparties, treating institutions as design primitives, and preserving human agency as a structural feature of the systems we build.

AI Impact Assessments

(1 models)

Scientific Impact Assessment

Core Contribution

This paper introduces and articulates the concept of "solipsistic superintelligence" — AI systems built under implicit assumptions of environmental exogeneity, distributional stationarity, and singleton framing — and argues these systems are structurally unlikely to be cooperative when deployed in multi-agent environments. The central novelty lies in reframing the AI safety problem: rather than focusing on misalignment of objectives (the traditional safety concern), the authors identify a structural failure mode where perfectly aligned systems still produce collective harm because deployment induces endogenous non-stationarity through behavioral, institutional, and algorithmic adaptation channels.

The paper formalizes the "train-test-deploy gap" (distinct from the standard train-test gap), the "self-undermining property" of unilateral optimization, and "equilibrium selection risk." It proposes three concrete research directions: dynamic evaluation with adaptive counterparties, institutions as design primitives, and preservation of human agency.

Methodological Rigor

As a position paper, this work is primarily argumentative rather than empirical. The formal contributions (Appendix A) are sound but relatively straightforward extensions of known game-theoretic concepts. The transition from MDPs to Markov games, the definition of endogenous non-stationarity, and the self-undermining property (Proposition A.4) are cleanly stated and logically coherent, though they formalize intuitions already well-understood in multi-agent systems and game theory.

The paper draws effectively on diverse empirical evidence — algorithmic collusion (Calvano et al., 2020), the Flash Crash, recommendation-driven polarization, alignment faking — to ground its claims. However, these examples serve more as illustrations than as systematic evidence for the paper's central thesis. The connection between current-scale failures and the projected superintelligence case relies on extrapolation rather than demonstration. The paper acknowledges this limitation implicitly by scoping its claims to "sociotechnical settings where advanced AI deployment will be heavily exposed to response dynamics."

The alternative views section (Section 6 and Appendix D) demonstrates intellectual honesty, addressing six substantive counterarguments. The rebuttals are generally well-reasoned, though the response to Argument 1 (multi-actor designs have worse failure modes) somewhat sidesteps the practical tractability concern — that engineering multi-agent cooperative systems is harder than engineering single aligned agents.

Potential Impact

The paper's framing could be influential in several ways:

1. Reorienting safety research: By arguing that cooperation is an equilibrium property rather than a capability to be trained, the paper challenges the implicit assumption that scaling + alignment = safe deployment. This could redirect research investment toward institutional design, mechanism design for AI agents, and dynamic evaluation.

2. Evaluation methodology: The call for dynamic evaluation testbeds with adaptive counterparties addresses a genuine gap. Table 1 in Appendix C provides a useful taxonomy. If adopted, this could fundamentally change how frontier models are assessed.

3. Policy and governance: The legitimacy arguments (Section 4.2) connect technical AI development to democratic theory and institutional economics in ways that could influence AI governance frameworks.

4. Bridging communities: The paper synthesizes insights from game theory, institutional economics, political philosophy, and ML into a coherent framework, potentially catalyzing cross-disciplinary collaboration.

However, the practical impact may be limited by the paper's conceptual rather than constructive nature. It identifies the problem clearly but offers relatively little in terms of concrete algorithms, benchmarks, or experimental protocols. The research agenda in Section 5 remains at a high level of abstraction.

Timeliness & Relevance

This paper is exceptionally well-timed. The rapid deployment of LLM-based agents in economic settings (coding assistants, search, recommendation, pricing) makes the multi-agent interaction problem urgent rather than speculative. Recent empirical results on LLM agents engaging in algorithmic collusion (Fish et al., 2026; Bertrand et al., 2025) and alignment faking (Greenblatt et al., 2024) provide concrete evidence for the dynamics the paper describes. The Collingridge dilemma argument — that waiting for evidence of catastrophe before changing course is precisely when correction is most costly — is particularly compelling given current deployment trajectories.

The paper also addresses a genuine intellectual gap: cooperative AI and multi-agent RL have long recognized these dynamics, but their insights remain "peripheral to the central scaling pathway" of foundation model development. Making this gap explicit at a venue like ICML could have agenda-setting effects.

Strengths

Conceptual clarity: The "solipsistic" framing is memorable and precisely defined through three assumptions (exogeneity, stationarity, singleton framing). The three channels of adaptation (behavioral, institutional, algorithmic) provide useful structure.

Intellectual breadth: The synthesis of game theory, institutional economics, political philosophy, and ML is genuinely valuable and rarely attempted at this level of coherence.

Well-articulated distinction from alignment: The argument that cooperation ≠ alignment addresses a common conflation and identifies a genuinely underappreciated failure mode.

Rich empirical documentation: Appendices B provides extensive real-world examples across domains, strengthening the argumentative case.

Limitations

Constructive deficit: The paper is substantially stronger on diagnosis than prescription. The three research directions (Section 5) lack the specificity needed to guide implementation. What exactly should a dynamic evaluation look like? How should institutions be "designed as primitives"?

Scope ambiguity: The paper oscillates between claims about current systems and claims about hypothetical superintelligence. The extrapolation from observed multi-agent failures to superintelligence-scale risks is asserted rather than demonstrated.

Novelty concerns: Many individual arguments are well-established in cooperative AI, multi-agent systems, mechanism design, and institutional economics. The synthesis is valuable but the components are familiar to researchers in those fields.

Missing engagement with counterexamples: Some deployed multi-agent systems (e.g., internet routing protocols, DNS) have achieved stable cooperation through design. Discussing these success cases would strengthen the prescriptive claims.

No experimental validation: Even a small-scale demonstration of the self-undermining property or equilibrium selection risk in a controlled setting would substantially strengthen the paper's empirical grounding.

Overall Assessment

This is a well-crafted position paper that articulates an important and timely argument with intellectual rigor and breadth. Its primary contribution is conceptual reframing rather than technical novelty. The "solipsistic superintelligence" framing is effective and the paper may succeed in shifting discourse around AI safety and evaluation. However, its impact will ultimately depend on whether it catalyzes concrete research programs rather than remaining a compelling but abstract critique.

Rating:6.5/ 10

Significance 7.5Rigor 5.5Novelty 6Clarity 8.5

Generated Jun 3, 2026

Comparison History (22)

vs. A Multi-AI-agent Framework Enabling End-to-end Finite Element Analysis for Solid Mechanics Problems

gemini-3.16/6/2026

Paper 1 addresses a foundational challenge in AGI development—AI alignment and cooperation—proposing a crucial paradigm shift away from solipsistic training. While Paper 2 offers a highly useful, practical tool for solid mechanics, Paper 1 has a vastly broader potential impact, potentially influencing the theoretical trajectory of AI safety, multi-agent reinforcement learning, and AI policy.

vs. DragOn: A Benchmark and Dataset for Drag-Based GUI Interactions

gpt-5.26/6/2026

Paper 2 is likely to have higher scientific impact because it delivers a concrete, reusable dataset and benchmark (286K screenshots, 3.5M tasks) that can immediately accelerate and standardize research on GUI agents, with clear real-world applications in desktop/mobile automation. It also includes comparative evaluations and a fine-tuning baseline, supporting methodological rigor and near-term adoption by the community. Paper 1 is conceptually novel and timely in AI governance/alignment, but is more programmatic and harder to operationalize or validate empirically, which may limit short-term measurable impact.

vs. REAL: Resolving Knowledge Conflicts in Knowledge-Intensive Visual Question Answering via Reasoning-Pivot Alignment

gemini-3.16/5/2026

Paper 1 addresses a fundamental, high-stakes paradigm shift in AI safety and alignment, offering broad theoretical impact across multiple disciplines. In contrast, Paper 2 focuses on a specific technical improvement in Visual Question Answering, which, while methodologically rigorous, has a much narrower scope and application field.

vs. On Information Self-Locking in Reinforcement Learning for Active Reasoning of LLM agents

gemini-3.16/5/2026

While Paper 1 offers a valuable conceptual paradigm shift for AI safety, Paper 2 demonstrates higher immediate scientific impact through methodological rigor and concrete empirical results. Paper 2 identifies a critical bottleneck in a highly relevant field (RL for LLM agents), provides theoretical analysis, and introduces a practical solution (AREW) yielding substantial performance gains. Its combination of actionable insights, reproducible methods (open-source code), and direct applicability to current cutting-edge AI development gives it a stronger, more measurable scientific trajectory than a conceptual position paper.

vs. Retrieval-aligned Tabular Foundation Models Enable Robust Clinical Risk Prediction in Electronic Health Records Under Real-world Constraints

claude-opus-4.66/5/2026

Paper 1 presents a concrete, novel methodological contribution (AWARE framework) with empirical validation on clinical EHR data, addressing practical bottlenecks in deploying tabular in-context learning. It offers measurable improvements (12.2% AUPRC gain) on a well-defined benchmark. Paper 2, while thought-provoking, is primarily a position/perspective paper on AI cooperation and superintelligence risks. It proposes a conceptual paradigm shift but lacks empirical validation. Paper 1's actionable contributions to clinical AI, methodological rigor, and direct applicability to healthcare give it higher near-term and measurable scientific impact.

vs. HLL: Can Agents Cross Humanity's Last Line of Verification?

gemini-3.16/3/2026

Paper 1 addresses foundational issues in AI safety and alignment, proposing a paradigm shift towards multi-agent cooperation and institutional design. Its theoretical framework has broad, long-term implications across AI, economics, and ethics. While Paper 2 offers a valuable empirical benchmark, its focus on CAPTCHA-solving is narrower and potentially more transient as AI capabilities and verification methods rapidly evolve.

vs. RelGT-AC: A Relational Graph Transformer for Autocomplete Tasks in Relational Databases

gpt-5.26/3/2026

Paper 1 has higher likely scientific impact: it presents a concrete, novel extension to an existing relational graph transformer with clear methodological contributions (masking to prevent leakage, unified heads, automated TF‑IDF text handling) and reports quantitative gains on a public benchmark (RelBench v2), supporting rigor and reproducibility. Its applications (enterprise/healthcare database autocomplete, data quality, decision support) are immediate and broadly useful across ML-for-data-management. Paper 2 is timely and potentially influential conceptually, but is primarily a position argument without demonstrated methods, benchmarks, or empirical validation, reducing near-term scientific and practical impact.

vs. Physically-Constrained Mamba-SDE for Remaining Useful Life Prediction under Irregular Observations

gpt-5.26/3/2026

Paper 1 presents a concrete, technically novel method (continuous-time Mamba encoder + physics-guided latent SDE) with theoretical guarantees and an evaluation scheme for realistic irregular sampling—well-scoped, rigorously testable, and immediately applicable to industrial predictive maintenance and other irregular time-series domains. Its impact can extend across prognostics, time-series modeling, and continuous-time ML. Paper 2 raises important conceptual alignment/governance points, but is largely a position/agenda piece with less methodological specificity and harder-to-validate claims, making near-term scientific impact less predictable.

vs. Overlaying Governance: A Compositional Authorization Framework for Delegation and Scope in Agentic AI

gemini-3.16/3/2026

Paper 2 addresses a critical, immediate need in agentic AI (authorization and delegation) with strong methodological rigor, including formal proofs and empirical evaluation. While Paper 1 provides a valuable conceptual framework for future AI alignment, Paper 2 offers concrete, actionable technical solutions that can be integrated into current systems, giving it a clearer path to near-term, widespread impact in AI engineering and security.

vs. EvoDrive: Pareto Evolution for Safety-Critical Autonomous Driving via Self-Improving LLM Agents

claude-opus-4.66/3/2026

Paper 1 addresses a fundamental paradigm-level challenge in AI safety and alignment—arguing that the dominant solipsistic AI design paradigm is structurally incompatible with cooperation, and proposing a new research direction treating interdependence as a core design principle. This has broad implications across AI safety, multi-agent systems, and AI governance. While Paper 2 presents a solid technical contribution (LLM-based evolution for autonomous driving testing), it is more incremental and domain-specific. Paper 1's conceptual reframing has potential to influence thinking across multiple fields and shape long-term AI development trajectories.

vs. SkillDAG: Self-Evolving Typed Skill Graphs for LLM Skill Selection at Scale

claude-opus-4.66/3/2026

Paper 1 addresses a fundamental, paradigm-level challenge in AI safety and alignment—arguing that the dominant solipsistic AI design paradigm is structurally incompatible with cooperation, and calling for a new research paradigm centered on interdependence. This has broad implications across AI safety, multi-agent systems, institutional design, and policy. Its timeliness is exceptional given rapid AI capability advances. Paper 2, while technically solid with strong empirical results on skill selection for LLM agents, addresses a narrower engineering problem with incremental improvements on specific benchmarks, limiting its broader scientific impact.

vs. Unveiling the Structure of Do-Calculus Reasoning via Derivation Graphs

gpt-5.26/3/2026

Paper 1 offers a concrete, novel technical contribution to causal inference: derivation graphs that characterize do-calculus equivalence classes, a bounded-step procedure, and a practical route to multiple estimands and potentially more efficient estimators. This is methodologically rigorous, immediately actionable for identification/estimation workflows, and likely to impact statistics, ML, epidemiology, and econometrics. Paper 2 is timely and broadly relevant but is primarily a conceptual position paper with less formal methodology and fewer directly testable/implementable results, which typically yields less near-term scientific uptake.

vs. Gender-Dependent Diagnostic Substitution in LLM Medical Triage: Same Symptoms, Unequal Urgency

gemini-3.16/3/2026

Paper 1 provides rigorous empirical evidence of a critical, real-world vulnerability in AI medical triage. Its identification of diagnostic substitution as the mechanism for bias offers actionable insights for immediate improvements in clinical LLMs. While Paper 2 presents an important theoretical framework for AI alignment, Paper 1's concrete methodology, timeliness, and direct implications for patient safety give it higher immediate scientific and societal impact.

vs. A Negative Result on Cross-Model Activation Transfer in a Pythia Multi-Hop Setting

gemini-3.16/3/2026

Paper 2 presents a broad, conceptual paradigm shift for AI alignment and multi-agent cooperation, which has the potential to influence research agendas across AI safety, economics, and systems design. In contrast, Paper 1 is a highly scoped negative result in mechanistic interpretability which, while methodologically useful, is likely to have a much narrower readership and fewer citations.

vs. LAP: An Agent-to-Instrument Protocol for Autonomous Science

gpt-5.26/3/2026

Paper 2 proposes a concrete, standards-oriented protocol (LAP) with explicit primitives (capabilities, reservations, safety handshakes, physically typed measurement schemas) that can be implemented and adopted across labs, enabling scalable autonomous science infrastructure. Its methodological rigor is higher (architecture, state machines, error model, schemas, compatibility with existing standards) and it has clear near-term real-world applications with broad impact across chemistry, biology, materials, and robotics. Paper 1 is conceptually novel and timely for AI governance/alignment, but is primarily a framing argument with less actionable, testable methodology, making impact less direct.

vs. MedCUA-Bench: A Screenshot-Only Benchmark for Clinical Computer-Use Agents

gemini-3.16/3/2026

Paper 1 provides a rigorous, immediately applicable benchmark for a critical real-world domain (healthcare), addressing a tangible gap in evaluating clinical AI safety. While Paper 2 offers a broad theoretical perspective on AI alignment, Paper 1 demonstrates superior methodological rigor through its interactive testbed, deterministic safety checking, and empirical baseline evaluation across 23 agents, guaranteeing immediate, measurable scientific impact and practical utility.

vs. DMF: A Deterministic Memory Framework for Conversational AI Agents

claude-opus-4.66/3/2026

Paper 1 addresses a fundamental, paradigm-level challenge in AI safety and alignment—how superintelligent systems can be designed for cooperation rather than solipsistic optimization. This has broad implications across AI safety, multi-agent systems, institutional design, and policy. Its conceptual framework (self-undermining property, non-solipsistic design) could reshape how the field approaches AI development. Paper 2, while practically useful in reducing token costs for conversational memory, is a more incremental engineering contribution with narrower scope. The timeliness of Paper 1's alignment concerns and its breadth of interdisciplinary impact give it substantially higher potential scientific influence.

vs. Towards Non-Monotonic Entailment in Propositional Defeasible Standpoint Logic

gpt-5.26/3/2026

Paper 1 targets a timely, high-stakes problem in AI: aligning increasingly capable systems with societal coexistence under endogenous feedback and deployment-induced nonstationarity. Its framing (self-undermining unilateral optimization, equilibrium-selection, institutions as design primitives) is broadly relevant across AI safety, evaluation, multi-agent systems, and policy, with clear real-world implications and potential to redirect research agendas. Paper 2 is methodologically rigorous and novel within non-monotonic/modal defeasible reasoning, but its impact is likely narrower (formal logic community) with less immediate cross-field or societal application.

vs. The Violation Situation Pattern: A Knowledge-Graph Pattern for Compliance Violations

claude-opus-4.66/3/2026

Paper 2 addresses a fundamental and timely challenge in AI safety—the cooperative alignment of superintelligent systems—which has enormous breadth of impact across AI research, policy, and society. Its conceptual reframing of the solipsistic optimization paradigm and call for non-solipsistic design principles could influence major research directions in AI alignment, multi-agent systems, and AI governance. Paper 1, while technically sound, addresses a narrower knowledge-graph engineering problem in compliance management with limited cross-domain impact. The timeliness and societal relevance of AI coexistence far exceeds that of compliance graph patterns.

vs. What Makes Interaction Trajectories Effective for Training Terminal Agents?

gpt-5.26/3/2026

Paper 2 has higher likely impact: it introduces a concrete, scalable methodology (Terminal-Lego) with measurable, reproducible gains and a testable mechanism (Environment-Grounded Supervision) explaining a counterintuitive “pedagogical paradox.” Its results are timely for post-training agents, offer immediate real-world applicability (data-efficient training, harness engineering), and can generalize across domains where interaction traces and verification exist. Paper 1 is conceptually novel and broadly relevant to AI alignment/institutions, but is more programmatic and less empirically grounded, reducing near-term methodological rigor and actionable uptake.