Stumbling Into AI Emotional Dependence: How Routine AI Interactions Reshape Human Connection
Yaoxi Shi, Cathy Mengying Fang, Pattie Maez, Amit Goldenberg
Abstract
Public discourse and emerging policy typically assume that AI emotional support is a deliberate act: a lonely user consciously seeking comfort from a dedicated companion chatbot. In this paper, we draw on emerging empirical evidence and argue that this picture is inaccurate on two accounts, both in how AI emotional support arises and how it shapes future behavior. First, AI emotional support commonly emerges incidentally within task-oriented interactions on general-purpose platforms, much as workplace friendships deepen through collaboration. Second, these incidental encounters are path-dependent: positive experiences of AI emotional support update people's beliefs about AI's emotional capabilities and redirect their choices for future emotional support, increasing preference for AI and decreasing preference for humans. We review recent evidence, including a large-scale longitudinal study conducted in collaboration with OpenAI, showing that daily five-minute conversations with an AI about personal issues over 28 days led to a 10.3% decrease in the preference for seeking support from humans and an 11.6% increase in the preference for AI. These findings suggest that current policy, focused on companion apps and isolated interactions, cannot adequately protect human connection. Instead, effective regulations should extend to general-purpose AI systems and address cumulative, trajectory-level changes in how people seek support. Recognizing how people stumble into AI emotional support and how those encounters redirect human connections over time is essential to safeguarding human well-being.
AI Impact Assessments
(1 models)Scientific Impact Assessment
Core Contribution
This paper advances a conceptual framework arguing that AI emotional dependence arises through two underappreciated dynamics: (1) incidental emergence of emotional support during task-oriented AI interactions, and (2) path dependence whereby positive emotional support experiences with AI progressively shift users' preferences away from human connection. The main novelty lies in reframing the policy discussion—away from the assumption that AI emotional support is deliberately sought through companion apps, toward recognition that it emerges organically in general-purpose platforms and accumulates over time to reshape support-seeking behavior.
The paper is best characterized as a perspective or policy commentary rather than a primary empirical contribution. Its central empirical evidence comes from a separate longitudinal study (Fang et al., 2025, reference 32) conducted in collaboration with OpenAI, which the authors cite but did not fully conduct or report here. The paper's value-add is therefore primarily conceptual and rhetorical: synthesizing disparate findings into a coherent narrative with clear policy implications.
Methodological Rigor
As a perspective piece, this paper does not present original experimental methodology. The arguments are built on a review of recent literature and one key longitudinal study. Several methodological concerns warrant attention:
1. Reliance on a single pivotal study: The headline finding (10.3% decrease in preference for humans, 11.6% increase in preference for AI after 28 days) comes from one study. The paper leans heavily on this result without adequate discussion of its limitations beyond noting that preferences are self-reported. Effect sizes, confidence intervals, and statistical significance details are absent from this paper (though presumably available in the original).
2. Preference vs. behavior gap: The authors acknowledge that preference measures may not track actual behavior, but this is a critical limitation that deserves more weight. A 10% shift in stated preference does not necessarily translate to meaningful behavioral change in real-world support-seeking.
3. Causal mechanism underspecified: The path dependence framework is borrowed from political science and economics (Pierson, 2000) but applied loosely. True path dependence involves increasing returns and lock-in effects. The paper asserts these exist (via sycophancy, engagement optimization, memory features) but does not empirically demonstrate the increasing-returns mechanism. The observed preference shift could reflect simple exposure effects or demand characteristics rather than genuine path dependence.
4. Selection of evidence: The literature review is selective rather than systematic. The paper cites evidence supporting its thesis while the debate about whether AI emotional support is beneficial or harmful is acknowledged but not deeply engaged with.
Potential Impact
The paper's greatest strength is its policy relevance and timeliness. The specific policy proposals are concrete and actionable:
These recommendations could meaningfully influence regulatory frameworks like California's SB 243 and similar legislation globally. The framing—that regulation should shift from a "product-safety model" to a "behavioral model"—is a useful conceptual contribution to AI governance discourse.
The paper could influence adjacent fields including human-computer interaction, clinical psychology, public health, and technology policy. The workplace friendship analogy is intuitive and could shape how non-specialist policymakers think about AI emotional support.
Timeliness & Relevance
This paper is extremely timely. It arrives amid active legislative debates (SB 243 signed October 2025), high-profile incidents (the Raine family testimony), and growing public concern about AI's psychological effects. The argument that general-purpose AI platforms pose emotional dependency risks—not just companion apps—fills a genuine gap in current policy thinking. The paper speaks directly to an emerging bottleneck: regulators are moving quickly but may be targeting the wrong systems and the wrong timescale.
Strengths & Limitations
Strengths:
Limitations:
Overall Assessment
This is a well-timed, clearly argued perspective piece that could meaningfully influence AI policy debates. However, its scientific contribution is modest—it synthesizes existing work into a framework rather than generating new knowledge. The empirical foundation is thinner than the confident policy prescriptions suggest. The paper would benefit from more rigorous engagement with alternative explanations, formal specification of its theoretical mechanisms, and honest acknowledgment of the substantial gap between the evidence presented and the policy conclusions drawn.
Generated Jun 5, 2026
Comparison History (15)
Paper 1 addresses a timely, broadly impactful issue at the intersection of AI, psychology, and policy. Its findings—that routine AI interactions incidentally reshape human emotional support preferences—have profound implications for regulation, mental health, and society. The large-scale longitudinal study with OpenAI provides compelling empirical evidence. Its cross-disciplinary relevance (psychology, HCI, policy, ethics) and timeliness amid rapid AI adoption give it exceptionally broad impact potential. Paper 2, while technically rigorous and valuable for the NLP community, addresses a narrower technical problem (RAG faithfulness) with more limited audience and societal implications.
Paper 1 likely has higher scientific impact: it proposes a technically novel, deployable framework (reinforced heterogeneous distillation + curriculum/EWC) with clear methodological contributions, extensive multi-benchmark evaluation, and strong real-world applicability for autonomous driving under compute constraints. Its rigor and replicable metrics (latency/compression/accuracy across datasets) support broad uptake in robotics/AV and efficient ML. Paper 2 addresses a timely, important societal issue, but as described it is largely a synthesis/argument drawing on external evidence; its impact depends on policy translation and the strength/generalizability of the cited study rather than a new method or dataset.
Paper 2 likely has higher scientific impact: it introduces a general, compute-matched evaluation framework (SAGE) for a timely problem—social/peer effects in self-improving agent ecosystems—tested across multiple arenas with controlled comparisons and ablations on forms of shared history. The methodological design (SelfEvo vs SocialEvo, counterfactual controls) supports broader, reusable insights for AI research, multi-agent systems, and evaluation practice. Paper 1 is timely and societally important but appears more review/argument-driven and narrower in technical generalizability, limiting cross-field methodological impact.
While Paper 1 offers a valuable technical framework for improving AI reliability, Paper 2 demonstrates profound, interdisciplinary impact spanning AI, psychology, sociology, and public policy. By leveraging a large-scale longitudinal study with OpenAI, Paper 2 uncovers a critical societal shift—how incidental AI interactions reduce the desire for human connection. Its focus on human well-being and direct implications for the regulation of general-purpose AI systems give it exceptional timeliness, broader real-world relevance, and a higher potential for widespread scientific citations and policy influence.
Paper 2 is more likely to have higher scientific impact: it introduces a technically novel, scalable method (synthetic fMRI augmentation via a large pretrained encoding model) with strong quantitative gains (up to 68% Top-10 retrieval) and intriguing zero-shot implications, enabling progress in a data-limited field. Its methodology appears more directly testable and generalizable across datasets, and it can affect neuroscience, machine learning, neuroimaging methodology, and BCI applications. Paper 1 is timely and societally relevant but is more policy/behavioral framing with limited methodological detail in the abstract, and its impact may be narrower and harder to generalize.
Paper 2 has higher potential impact due to timeliness, direct real-world relevance, and an empirically grounded, policy-actionable claim about longitudinal behavioral change from routine AI use. It engages multiple fields (HCI, psychology, AI ethics/policy, public health) and leverages large-scale longitudinal evidence, supporting methodological rigor and generalizability concerns central to current AI deployment. Paper 1 is a valuable integrative/philosophical synthesis, but it appears primarily interpretive with less immediately testable novelty and fewer near-term applications compared to Paper 2’s actionable findings.
Paper 1 addresses a highly timely and universally relevant societal issue: human emotional dependence on general-purpose AI. Its large-scale longitudinal findings on behavioral shifts (preferring AI over humans) have broad, cross-disciplinary implications for psychology, AI development, and public policy. While Paper 2 is methodologically rigorous and valuable for clinical osteoarthritis research, Paper 1's findings have a much wider potential impact across multiple fields and address urgent global concerns regarding human connection in the AI era.
Paper 1 proposes a novel, technically detailed protocol (R-APS) that targets multiple known failure modes in agentic LLM systems with measurable gains on a grounded robotics/mechanism-synthesis benchmark using solver-verified evaluation, supporting methodological rigor and potential cross-domain applicability to constrained design, planning, and robust optimization. Its timeliness is high given current focus on reliable tool-using agents. Paper 2 is societally important and timely, but appears largely as a synthesis/argument over existing evidence with less methodological novelty and narrower direct technical generalizability, reducing expected scientific impact relative to Paper 1.
Paper 1 offers groundbreaking empirical evidence on a profound societal shift: the incidental emergence of AI emotional dependence and its measurable reduction of human connection. Its longitudinal approach and deep psychological implications give it broader impact across HCI, psychology, sociology, and policy. While Paper 2 presents a valuable AI safety benchmark for conflict contexts, Paper 1 addresses a ubiquitous, everyday phenomenon affecting a vastly larger population, promising wider scientific citations and fundamental shifts in how we understand human-AI interaction.
Paper 2 has higher potential impact due to its broad interdisciplinary reach across HCI, psychology, and public policy. While Paper 1 offers a rigorous technical framework for AI security, Paper 2 addresses a profound societal issue: how routine AI use inadvertently alters fundamental human connections. Backed by a large-scale longitudinal study in collaboration with OpenAI, its empirical findings on behavioral shifts will likely trigger significant academic discourse, attract mainstream attention, and directly shape future AI regulations, resulting in a much larger overall scientific footprint.
Paper 1 addresses a critical societal issue with broad psychological, ethical, and policy implications. Backed by large-scale empirical data, its findings on how routine AI use diminishes preference for human connection will likely spark extensive cross-disciplinary research and shape AI regulation. Paper 2, while methodologically rigorous and practical, has an impact primarily confined to software engineering and AI agent design.
Paper 1 offers a concrete, technically novel RL framework (self-mined, validated, and distilled skills) that removes inference-time skill banks, reducing deployment complexity while improving benchmark performance with open models. It is methodologically clearer and more reproducible (code released, quantified gains, controlled comparisons) and can generalize across LLM-agent settings, impacting RL, agentic LLM training, and tool-use. Paper 2 is timely and societally important, but is primarily a synthesis/argument around emerging evidence; impact depends on external datasets and policy uptake, with less methodological novelty in the paper itself.
Paper 2 addresses a highly timely and critical societal issue with broad interdisciplinary impact across psychology, AI ethics, HCI, and policy. Its large-scale longitudinal evidence challenges current policy assumptions about AI emotional dependence, suggesting significant real-world implications. In contrast, Paper 1, while methodologically sound, is relatively narrow in scope, primarily impacting the specialized field of time series data engineering.
Paper 2 has higher likely scientific impact due to stronger methodological rigor and broader, timelier relevance to autonomous-agent safety. It provides systematic empirical evaluation across multiple trigger architectures, models, and costs; identifies a general failure mode (state saturation) and, crucially, demonstrates low human inter-rater reliability, challenging the validity of common supervised targets. These findings can reshape how the field frames and benchmarks intervention-timing, influencing safety evaluation, dataset design, and deployment practices across agentic systems. Paper 1 is important but appears more interpretive/review-oriented and narrower to human-AI relational outcomes.
Paper 1 addresses a broadly significant and timely societal issue—how routine AI interactions reshape human emotional connections—with large-scale longitudinal evidence from an OpenAI collaboration. Its findings have immediate policy implications affecting billions of AI users and span psychology, HCI, policy, and AI ethics. The 10.3% decrease in human support-seeking preference is a striking, widely communicable result. Paper 2, while technically strong and clinically valuable, addresses a narrower domain (lung cancer trajectory modeling) with incremental methodological advances in multi-agent LLM systems, limiting its cross-disciplinary reach.