Stumbling Into AI Emotional Dependence: How Routine AI Interactions Reshape Human Connection

Yaoxi Shi, Cathy Mengying Fang, Pattie Maez, Amit Goldenberg

#1235 of 3404 · Artificial Intelligence
Share
Tournament Score
1433±46
10501800
60%
Win Rate
9
Wins
6
Losses
15
Matches
Rating
5.5/ 10
Significance
Rigor
Novelty
Clarity

Abstract

Public discourse and emerging policy typically assume that AI emotional support is a deliberate act: a lonely user consciously seeking comfort from a dedicated companion chatbot. In this paper, we draw on emerging empirical evidence and argue that this picture is inaccurate on two accounts, both in how AI emotional support arises and how it shapes future behavior. First, AI emotional support commonly emerges incidentally within task-oriented interactions on general-purpose platforms, much as workplace friendships deepen through collaboration. Second, these incidental encounters are path-dependent: positive experiences of AI emotional support update people's beliefs about AI's emotional capabilities and redirect their choices for future emotional support, increasing preference for AI and decreasing preference for humans. We review recent evidence, including a large-scale longitudinal study conducted in collaboration with OpenAI, showing that daily five-minute conversations with an AI about personal issues over 28 days led to a 10.3% decrease in the preference for seeking support from humans and an 11.6% increase in the preference for AI. These findings suggest that current policy, focused on companion apps and isolated interactions, cannot adequately protect human connection. Instead, effective regulations should extend to general-purpose AI systems and address cumulative, trajectory-level changes in how people seek support. Recognizing how people stumble into AI emotional support and how those encounters redirect human connections over time is essential to safeguarding human well-being.

AI Impact Assessments

(1 models)

Scientific Impact Assessment

Core Contribution

This paper advances a conceptual framework arguing that AI emotional dependence arises through two underappreciated dynamics: (1) incidental emergence of emotional support during task-oriented AI interactions, and (2) path dependence whereby positive emotional support experiences with AI progressively shift users' preferences away from human connection. The main novelty lies in reframing the policy discussion—away from the assumption that AI emotional support is deliberately sought through companion apps, toward recognition that it emerges organically in general-purpose platforms and accumulates over time to reshape support-seeking behavior.

The paper is best characterized as a perspective or policy commentary rather than a primary empirical contribution. Its central empirical evidence comes from a separate longitudinal study (Fang et al., 2025, reference 32) conducted in collaboration with OpenAI, which the authors cite but did not fully conduct or report here. The paper's value-add is therefore primarily conceptual and rhetorical: synthesizing disparate findings into a coherent narrative with clear policy implications.

Methodological Rigor

As a perspective piece, this paper does not present original experimental methodology. The arguments are built on a review of recent literature and one key longitudinal study. Several methodological concerns warrant attention:

1. Reliance on a single pivotal study: The headline finding (10.3% decrease in preference for humans, 11.6% increase in preference for AI after 28 days) comes from one study. The paper leans heavily on this result without adequate discussion of its limitations beyond noting that preferences are self-reported. Effect sizes, confidence intervals, and statistical significance details are absent from this paper (though presumably available in the original).

2. Preference vs. behavior gap: The authors acknowledge that preference measures may not track actual behavior, but this is a critical limitation that deserves more weight. A 10% shift in stated preference does not necessarily translate to meaningful behavioral change in real-world support-seeking.

3. Causal mechanism underspecified: The path dependence framework is borrowed from political science and economics (Pierson, 2000) but applied loosely. True path dependence involves increasing returns and lock-in effects. The paper asserts these exist (via sycophancy, engagement optimization, memory features) but does not empirically demonstrate the increasing-returns mechanism. The observed preference shift could reflect simple exposure effects or demand characteristics rather than genuine path dependence.

4. Selection of evidence: The literature review is selective rather than systematic. The paper cites evidence supporting its thesis while the debate about whether AI emotional support is beneficial or harmful is acknowledged but not deeply engaged with.

Potential Impact

The paper's greatest strength is its policy relevance and timeliness. The specific policy proposals are concrete and actionable:

  • Extending regulation beyond companion apps to general-purpose AI systems
  • Requiring transparency about trajectory-level behavioral changes
  • Mandating notifications when interactions shift from task to emotional support
  • Decoupling emotional support interactions from engagement metrics
  • Capping cross-session memory in emotional conversations
  • Requiring independent audits of population-level effects
  • These recommendations could meaningfully influence regulatory frameworks like California's SB 243 and similar legislation globally. The framing—that regulation should shift from a "product-safety model" to a "behavioral model"—is a useful conceptual contribution to AI governance discourse.

    The paper could influence adjacent fields including human-computer interaction, clinical psychology, public health, and technology policy. The workplace friendship analogy is intuitive and could shape how non-specialist policymakers think about AI emotional support.

    Timeliness & Relevance

    This paper is extremely timely. It arrives amid active legislative debates (SB 243 signed October 2025), high-profile incidents (the Raine family testimony), and growing public concern about AI's psychological effects. The argument that general-purpose AI platforms pose emotional dependency risks—not just companion apps—fills a genuine gap in current policy thinking. The paper speaks directly to an emerging bottleneck: regulators are moving quickly but may be targeting the wrong systems and the wrong timescale.

    Strengths & Limitations

    Strengths:

  • Clear, accessible writing that effectively communicates complex dynamics to both academic and policy audiences
  • The "stumbling into" metaphor is powerful and captures a real phenomenon
  • Concrete, implementable policy recommendations rather than vague calls for more research
  • The distinction between incidental emergence and path dependence provides a useful analytical framework
  • Draws on evidence from multiple sources including platform-level data (3% of ChatGPT/Claude interactions involve emotional support)
  • Limitations:

  • Not a primary research contribution: The paper synthesizes others' work and proposes a framework, but generates no new empirical evidence
  • Conceptual looseness: Key concepts like "path dependence" and "stumbling" are used metaphorically rather than formally defined, making them difficult to test or falsify
  • Missing counterfactual reasoning: Would the same preference shifts occur with any repeated positive interaction (e.g., daily journaling, talking to a stuffed animal)? Without control conditions that isolate the AI-specific mechanism, the path dependence argument is weakened
  • Omits counterarguments: The paper does not seriously engage with the possibility that AI emotional support could be net positive for population well-being, or that preference shifts might be rational adaptations
  • Brief treatment of implementation challenges: Detecting when a conversation shifts from task to emotional support is technically non-trivial, and the paper does not address the difficulty
  • No competing interests, but collaboration with OpenAI: The longitudinal study was conducted with OpenAI, which deserves more transparency about the nature of the collaboration
  • Overall Assessment

    This is a well-timed, clearly argued perspective piece that could meaningfully influence AI policy debates. However, its scientific contribution is modest—it synthesizes existing work into a framework rather than generating new knowledge. The empirical foundation is thinner than the confident policy prescriptions suggest. The paper would benefit from more rigorous engagement with alternative explanations, formal specification of its theoretical mechanisms, and honest acknowledgment of the substantial gap between the evidence presented and the policy conclusions drawn.

    Rating:5.5/ 10
    Significance 6.5Rigor 4Novelty 5.5Clarity 8

    Generated Jun 5, 2026

    Comparison History (15)

    vs. FIDES: Faithful Inference via Deep Evidence Signals for Retrieval-Memory Conflict in RAG
    claude-opus-4.66/6/2026

    Paper 1 addresses a timely, broadly impactful issue at the intersection of AI, psychology, and policy. Its findings—that routine AI interactions incidentally reshape human emotional support preferences—have profound implications for regulation, mental health, and society. The large-scale longitudinal study with OpenAI provides compelling empirical evidence. Its cross-disciplinary relevance (psychology, HCI, policy, ethics) and timeliness amid rapid AI adoption give it exceptionally broad impact potential. Paper 2, while technically rigorous and valuable for the NLP community, addresses a narrower technical problem (RAG faithfulness) with more limited audience and societal implications.

    vs. MAVEN-T: Reinforced Heterogeneous Distillation for Real-Time Multi-Agent Trajectory Prediction
    gpt-5.26/6/2026

    Paper 1 likely has higher scientific impact: it proposes a technically novel, deployable framework (reinforced heterogeneous distillation + curriculum/EWC) with clear methodological contributions, extensive multi-benchmark evaluation, and strong real-world applicability for autonomous driving under compute constraints. Its rigor and replicable metrics (latency/compression/accuracy across datasets) support broad uptake in robotics/AV and efficient ML. Paper 2 addresses a timely, important societal issue, but as described it is largely a synthesis/argument drawing on external evidence; its impact depends on policy translation and the strength/generalizability of the cited study rather than a new method or dataset.

    vs. SAGE: A Quantitative Evaluation of Socialized Evolution in Agent Ecosystems
    gpt-5.26/6/2026

    Paper 2 likely has higher scientific impact: it introduces a general, compute-matched evaluation framework (SAGE) for a timely problem—social/peer effects in self-improving agent ecosystems—tested across multiple arenas with controlled comparisons and ablations on forms of shared history. The methodological design (SelfEvo vs SocialEvo, counterfactual controls) supports broader, reusable insights for AI research, multi-agent systems, and evaluation practice. Paper 1 is timely and societally important but appears more review/argument-driven and narrower in technical generalizability, limiting cross-field methodological impact.

    vs. Where Should Knowledge Enter? A Layered Framework for Knowledge Infusion in Multimodal Iterative Generative Mo
    gemini-3.16/5/2026

    While Paper 1 offers a valuable technical framework for improving AI reliability, Paper 2 demonstrates profound, interdisciplinary impact spanning AI, psychology, sociology, and public policy. By leveraging a large-scale longitudinal study with OpenAI, Paper 2 uncovers a critical societal shift—how incidental AI interactions reduce the desire for human connection. Its focus on human well-being and direct implications for the regulation of general-purpose AI systems give it exceptional timeliness, broader real-world relevance, and a higher potential for widespread scientific citations and policy influence.

    vs. Boosting Brain-to-Image Decoding with TRIBE v2 Data Augmentation
    gpt-5.26/5/2026

    Paper 2 is more likely to have higher scientific impact: it introduces a technically novel, scalable method (synthetic fMRI augmentation via a large pretrained encoding model) with strong quantitative gains (up to 68% Top-10 retrieval) and intriguing zero-shot implications, enabling progress in a data-limited field. Its methodology appears more directly testable and generalizable across datasets, and it can affect neuroscience, machine learning, neuroimaging methodology, and BCI applications. Paper 1 is timely and societally relevant but is more policy/behavioral framing with limited methodological detail in the abstract, and its impact may be narrower and harder to generalize.

    vs. On the evolution of the concept of probability as a mirror of the evolution of reason
    gpt-5.26/5/2026

    Paper 2 has higher potential impact due to timeliness, direct real-world relevance, and an empirically grounded, policy-actionable claim about longitudinal behavioral change from routine AI use. It engages multiple fields (HCI, psychology, AI ethics/policy, public health) and leverages large-scale longitudinal evidence, supporting methodological rigor and generalizability concerns central to current AI deployment. Paper 1 is a valuable integrative/philosophical synthesis, but it appears primarily interpretive with less immediately testable novelty and fewer near-term applications compared to Paper 2’s actionable findings.

    vs. An interpretable and trustworthy AI framework for large-scale longitudinal structure-pain association studies using data from the Osteoarthritis Initiative (OAI)
    gemini-3.16/5/2026

    Paper 1 addresses a highly timely and universally relevant societal issue: human emotional dependence on general-purpose AI. Its large-scale longitudinal findings on behavioral shifts (preferring AI over humans) have broad, cross-disciplinary implications for psychology, AI development, and public policy. While Paper 2 is methodologically rigorous and valuable for clinical osteoarthritis research, Paper 1's findings have a much wider potential impact across multiple fields and address urgent global concerns regarding human connection in the AI era.

    vs. R-APS: Compositional Reasoning and In-Context Meta-Learning for Constrained Design via Reflective Adversarial Pareto Search
    gpt-5.26/5/2026

    Paper 1 proposes a novel, technically detailed protocol (R-APS) that targets multiple known failure modes in agentic LLM systems with measurable gains on a grounded robotics/mechanism-synthesis benchmark using solver-verified evaluation, supporting methodological rigor and potential cross-domain applicability to constrained design, planning, and robust optimization. Its timeliness is high given current focus on reliable tool-using agents. Paper 2 is societally important and timely, but appears largely as a synthesis/argument over existing evidence with less methodological novelty and narrower direct technical generalizability, reducing expected scientific impact relative to Paper 1.

    vs. Can AI Make Conflicts Worse? An Alignment Failure in LLM Deployment Across Conflict Contexts
    gemini-3.16/5/2026

    Paper 1 offers groundbreaking empirical evidence on a profound societal shift: the incidental emergence of AI emotional dependence and its measurable reduction of human connection. Its longitudinal approach and deep psychological implications give it broader impact across HCI, psychology, sociology, and policy. While Paper 2 presents a valuable AI safety benchmark for conflict contexts, Paper 1 addresses a ubiquitous, everyday phenomenon affecting a vastly larger population, promising wider scientific citations and fundamental shifts in how we understand human-AI interaction.

    vs. Overlaying Governance: A Compositional Authorization Framework for Delegation and Scope in Agentic AI
    gemini-3.16/5/2026

    Paper 2 has higher potential impact due to its broad interdisciplinary reach across HCI, psychology, and public policy. While Paper 1 offers a rigorous technical framework for AI security, Paper 2 addresses a profound societal issue: how routine AI use inadvertently alters fundamental human connections. Backed by a large-scale longitudinal study in collaboration with OpenAI, its empirical findings on behavioral shifts will likely trigger significant academic discourse, attract mainstream attention, and directly shape future AI regulations, resulting in a much larger overall scientific footprint.

    vs. Handoff Debt: The Rediscovery Cost When Coding Agents Take Over Interrupted Tasks
    gemini-3.16/5/2026

    Paper 1 addresses a critical societal issue with broad psychological, ethical, and policy implications. Backed by large-scale empirical data, its findings on how routine AI use diminishes preference for human connection will likely spark extensive cross-disciplinary research and shape AI regulation. Paper 2, while methodologically rigorous and practical, has an impact primarily confined to software engineering and AI agent design.

    vs. SIRI: Self-Internalizing Reinforcement Learning with Intrinsic Skills for LLM Agent Training
    gpt-5.26/5/2026

    Paper 1 offers a concrete, technically novel RL framework (self-mined, validated, and distilled skills) that removes inference-time skill banks, reducing deployment complexity while improving benchmark performance with open models. It is methodologically clearer and more reproducible (code released, quantified gains, controlled comparisons) and can generalize across LLM-agent settings, impacting RL, agentic LLM training, and tool-use. Paper 2 is timely and societally important, but is primarily a synthesis/argument around emerging evidence; impact depends on external datasets and policy uptake, with less methodological novelty in the paper itself.

    vs. TSQAgent: Rating Time Series Data Quality via Dedicated Agentic Reasoning
    gemini-3.16/5/2026

    Paper 2 addresses a highly timely and critical societal issue with broad interdisciplinary impact across psychology, AI ethics, HCI, and policy. Its large-scale longitudinal evidence challenges current policy assumptions about AI emotional dependence, suggesting significant real-world implications. In contrast, Paper 1, while methodologically sound, is relatively narrow in scope, primarily impacting the specialized field of time series data engineering.

    vs. The Saturation Trap and the Subjectivity of Intervention Timing: Why Affect-Based Triggers and LLM Judges Fail to Time Interventions on Autonomous Agents
    gpt-5.26/5/2026

    Paper 2 has higher likely scientific impact due to stronger methodological rigor and broader, timelier relevance to autonomous-agent safety. It provides systematic empirical evaluation across multiple trigger architectures, models, and costs; identifies a general failure mode (state saturation) and, crucially, demonstrates low human inter-rater reliability, challenging the validity of common supervised targets. These findings can reshape how the field frames and benchmarks intervention-timing, influencing safety evaluation, dataset design, and deployment practices across agentic systems. Paper 1 is important but appears more interpretive/review-oriented and narrower to human-AI relational outcomes.

    vs. Traj-Evolve: A Self-Evolving Multi-Agent System for Patient Trajectory Modeling in Lung Cancer Early Detection
    claude-opus-4.66/5/2026

    Paper 1 addresses a broadly significant and timely societal issue—how routine AI interactions reshape human emotional connections—with large-scale longitudinal evidence from an OpenAI collaboration. Its findings have immediate policy implications affecting billions of AI users and span psychology, HCI, policy, and AI ethics. The 10.3% decrease in human support-seeking preference is a striking, widely communicable result. Paper 2, while technically strong and clinically valuable, addresses a narrower domain (lung cancer trajectory modeling) with incremental methodological advances in multi-agent LLM systems, limiting its cross-disciplinary reach.