The Impact of AI Usage and Informativeness on Skill Development in Logical Reasoning
Shang Wu, Hongyu Yao, Catarina Belem, Shuyuan Fu, Mark Steyvers, Padhraic Smyth
Abstract
Artificial intelligence (AI) is being increasingly integrated into human problem-solving, yet its effects on individual skill development remain unclear. We examine how both AI usage and informativeness can shape learning in the context of a controlled logical reasoning task with on-demand access to AI assistance. We find that greater AI usage is associated with weaker skill development: heavy AI users underperform relative to comparable peers, whereas light AI users perform similarly to matched users who do not use AI. We also find in our study that these patterns are mediated by AI informativeness. Low-information AI neither improves immediate performance nor preserves performance after AI assistance is removed, and is linked to weaker learning overall. On the other hand, high-information AI was found to improve short-run performance without reducing post-AI outcomes on average in our experiments, but with heterogeneous effects. Our findings in general suggest that AI can, depending on context, either complement human skill development by amplifying independent reasoning or can act as a substitute that undermines such reasoning, with the implication that regulating AI access and usage will be important for promoting skill development in the presence of AI assistance.
AI Impact Assessments
(1 models)Scientific Impact Assessment
Core Contribution
This paper investigates how AI usage intensity and informativeness affect human skill development in logical reasoning tasks, using a controlled pre-/post-assessment experimental design. The main novelty lies in the intersection of three factors: (1) measuring skill development *after* AI removal rather than during AI use, (2) experimentally manipulating AI informativeness while holding accuracy constant, and (3) examining individual heterogeneity in how users engage with AI assistance. The key finding is that heavy AI usage is associated with weaker skill development, and this relationship is moderated by AI informativeness—low-information AI uniformly harms learning while high-information AI produces heterogeneous effects that widen ability gaps.
Methodological Rigor
The experimental design has notable strengths but also significant limitations:
Strengths:
Weaknesses:
Potential Impact
The paper addresses a timely concern about AI's effects on human cognition and learning. The practical implications are relevant to:
However, the impact is tempered by the narrow experimental context. The logic puzzle paradigm, while controlled, is far removed from the educational and professional domains where these findings would matter most (e.g., medical diagnosis, programming, essay writing). The findings largely confirm intuitions rather than reveal surprising mechanisms—that heavy reliance on external tools reduces independent learning is well-established in educational psychology literature on scaffolding and desirable difficulties.
Timeliness & Relevance
The paper is highly timely. As LLMs and AI assistants become ubiquitous, understanding their impact on human skill development is urgent. The paper arrives amid growing concern about "cognitive offloading" and "deskilling" in AI-assisted environments. It speaks directly to debates in education about ChatGPT policies and in medicine about diagnostic deskilling with AI support. The HHAI 2026 venue is appropriate.
Strengths & Limitations
Key Strengths:
1. The pre-/post-AI assessment design is the paper's strongest methodological contribution, directly measuring what matters—performance after AI is removed.
2. The distinction between AI informativeness levels is a clean experimental manipulation that yields interpretable results.
3. The heterogeneity analysis revealing that high-information AI widens ability gaps is the most novel and policy-relevant finding.
4. The "solo share" metric captures a meaningful behavioral dimension of cognitive engagement.
Notable Weaknesses:
1. Causal claims exceed design: Usage intensity is not randomly assigned, yet much of the discussion implies causal relationships. The PSM approach mitigates but does not resolve this.
2. Limited statistical power: Key subgroup findings rely on small cells (splitting 43 high-info participants by ability level yields ~21 per group), and several results hover around conventional significance thresholds.
3. Task artificiality: Logic puzzles with deterministic solutions and perfect AI accuracy are far from real-world AI-assisted learning scenarios.
4. Short time horizon: 20 minutes of AI exposure is insufficient to study "skill development" in any meaningful sense—this is more accurately characterized as short-term performance transfer.
5. Missing controls and analyses: No analysis of learning curves within phases, no examination of which specific problems benefited from AI, and limited analysis of what strategies participants actually developed.
6. Self-selection bias: The finding that heavy AI users perform worse is confounded by the possibility that individuals who rely heavily on AI are systematically different in ways not captured by Phase 1 performance alone (e.g., motivation, self-regulation).
7. Inflated confidence finding: The observation about miscalibrated self-assessment among lower-ability participants is suggestive but based on single survey items without validated scales.
Overall Assessment
This paper makes a relevant and timely contribution to an important question, with a reasonably well-designed experiment. However, its impact is limited by the artificial task domain, modest sample sizes, short time horizons, and causal inference challenges. The findings are directionally interesting and policy-relevant but not definitive. The paper is best viewed as a well-motivated pilot study that points toward important dynamics requiring larger-scale, longer-term, and more ecologically valid investigation. It is appropriate for a workshop or short conference paper at HHAI but would need substantially more evidence to influence policy or practice.
Generated May 22, 2026
Comparison History (19)
Paper 1 addresses a timely, broadly relevant question about AI's impact on human skill development with empirical evidence from controlled experiments. Its findings on how AI usage intensity and informativeness affect learning have immediate implications for education, workforce training, and AI policy—topics of enormous current societal interest. The nuanced finding that AI can complement or substitute for human reasoning depending on context is novel and actionable. Paper 2 presents an interesting software architecture (ActiveGraph) for agentic systems, but it is more niche, lacks empirical validation of its claimed benefits, and primarily contributes to AI engineering rather than generating broadly impactful scientific insights.
Paper 1 addresses a critical technical bottleneck in machine learning (explainability) by introducing a novel integration of causal discovery and argumentation frameworks. This methodological advancement has broad, scalable applicability across numerous high-stakes domains requiring interpretable AI, offering foundational algorithmic tools for future AI development that typically yield higher cross-disciplinary citations than the behavioral insights presented in Paper 2.
Paper 2 addresses a fundamental question about how AI usage affects human skill development and learning, with broad implications for education, workforce development, and AI policy. Its controlled experimental design examining causal mechanisms (AI informativeness as mediator) provides rigorous evidence on a timely topic relevant across multiple fields. Paper 1, while practically useful, is an engineering contribution (a Python framework reducing boilerplate) with narrow impact limited to a specific developer community and lacking scientific novelty beyond software engineering convenience.
Paper 1 addresses a fundamental question about AI's impact on human skill development with rigorous experimental methodology, producing generalizable insights about AI-as-complement vs AI-as-substitute that have broad implications across education, workforce development, and AI policy. Its findings on AI usage intensity and informativeness mediating learning outcomes are novel and timely, relevant to nearly every domain where AI assistance is deployed. Paper 2, while creative in its pedagogical approach, is more niche—focused on a specific classroom practice and benchmark artifact with narrower applicability and less generalizable scientific contributions.
Paper 1 is more methodologically innovative and timely for autonomous systems: it proposes a concrete, training-free latent-communication framework (ILD, CHSA, SSKD) addressing clear bottlenecks (latency, information loss, identity confusion) and validates in closed-loop CARLA, implying near-term deployment relevance for connected AVs and multi-agent robotics. Its ideas may generalize to other multi-agent settings (robot swarms, decentralized inference), broadening impact. Paper 2 is important and applicable to education/policy, but likely less novel methodologically and more context-dependent; rigor is hard to judge from abstract alone.
Paper 1 likely has higher scientific impact due to a more novel, causally oriented contribution: controlled experiments on AI usage/informativeness and downstream skill development, with mediating mechanisms and heterogeneous effects. This is timely and broadly relevant to education, human-AI interaction, labor economics, and policy (e.g., regulating AI access). Its methodological rigor appears stronger than Paper 2’s small-N, single-firm qualitative interviews, which are valuable for insight and hypothesis generation but have limited generalizability and weaker causal claims, thus narrower scientific reach.
While Paper 1 presents a strong technical advancement in AI agent architecture, Paper 2 addresses a critical and highly timely societal issue: the impact of AI on human skill development. Its findings have far-reaching implications across multiple disciplines, including education, cognitive psychology, human-computer interaction, and AI policy, giving it a broader potential scientific and real-world impact compared to the specialized algorithmic improvements in Paper 1.
Paper 1 likely has higher scientific impact due to a more novel technical contribution (structured experience graphs for self-evolving LLM agents), clear methodology with benchmarked performance/efficiency gains, and broad applicability across agentic systems, continual learning, memory/knowledge representation, and software automation. Its timeliness is high given rapid adoption of deployable agents and the need for scalable improvement mechanisms. Paper 2 is important and relevant for human-AI learning and policy, but its impact may be narrower and more context-dependent (specific task/experimental setting) and less likely to generalize into widely reusable methods or systems.
Paper 1 addresses a highly timely and broadly relevant issue—how AI affects human skill development—with significant real-world implications for education, policy, and cognitive science. While Paper 2 offers impressive algorithmic improvements (orders of magnitude) in POMDP planning, its impact is largely confined to the specific subfield of robotics and decision-making. Paper 1's cross-disciplinary appeal and societal relevance give it higher potential scientific impact.
Paper 1 likely has higher scientific impact due to broader cross-field relevance and timeliness: it addresses how AI assistance affects human skill acquisition, a central question for education, workplace training, human–AI interaction, and AI governance. The experimental, controlled-task design and mediation via “informativeness” suggests stronger causal/methodological rigor than a primarily observational/computational humanities comparison. Its findings have direct real-world implications for regulating AI access and designing assistive systems. Paper 2 is innovative and valuable within digital humanities, but its impact is likely narrower and more domain-specific.
Paper 2 addresses an extremely timely and broad societal issue: the impact of AI on human learning and skill development. Its findings have wide-ranging implications across education, psychology, human-computer interaction, and AI policy. While Paper 1 provides an innovative computational framework for digital humanities, its primary impact is largely confined to history and archival studies, giving Paper 2 a significantly higher potential for cross-disciplinary and real-world scientific impact.
Paper 1 addresses a highly timely and widely relevant issue: the impact of AI on human learning and skill development. Its findings have broad implications across education, cognitive science, human-computer interaction, and AI policy. In contrast, Paper 2 focuses on a very niche theoretical advancement in answer set programming. While methodologically rigorous, Paper 2's impact is largely confined to a specific subfield of logic programming, whereas Paper 1 has significant real-world applicability and interdisciplinary appeal.
Paper 1 likely has higher scientific impact: it introduces a novel, timely benchmark for end-to-end LLM agent spreadsheet generation in high-stakes finance, with a multidimensional evaluation taxonomy that can become a standard tool for model assessment and drive measurable progress across agentic AI, HCI, and enterprise automation. Its applications are concrete and immediate (auditing, reliability, workflow automation), and benchmarks typically catalyze broad follow-on research. Paper 2 is important and relevant, but its contribution (AI assistance effects on learning) is less methodologically distinctive from prior human-AI/education studies and may have narrower generalizability beyond the specific task setting.
Paper 1 addresses a timely and broadly impactful question about how AI usage affects human skill development, with implications spanning education, workforce training, and AI policy. Its findings—that heavy AI use can substitute for rather than complement learning—have immediate real-world relevance as AI tools become ubiquitous. Paper 2 makes a solid technical contribution to guided sampling in flow/diffusion models, but its impact is more narrowly scoped to the generative modeling community. Paper 1's breadth of societal impact and timeliness give it higher potential scientific impact.
Paper 2 addresses the timely and broadly relevant question of how AI usage affects human skill development, with empirical findings from controlled experiments. Its implications span education, cognitive science, AI policy, and workforce development, giving it wide interdisciplinary appeal. The finding that AI can either complement or substitute for human reasoning, depending on informativeness and usage intensity, has immediate practical applications for AI tool design and educational policy. Paper 1, while intellectually interesting, is more niche—proposing a theoretical framework for knowledge graph re-engineering with limited empirical validation beyond a case study, and targeting a narrower community.
Paper 2 likely has higher impact: it introduces a reusable, conversation-grounded EI benchmark with participant-provided turn-by-turn annotations, enabling standardized evaluation and model comparison across many systems—high novelty, timeliness, and broad applicability to NLP, HCI, safety, and alignment. The benchmark can directly influence model development and deployment practices. Paper 1 addresses an important question (AI assistance and skill development) with real-world relevance, but appears narrower in scope (a controlled logical reasoning task) and its impact may depend on generalizability beyond the specific experimental setting.
Paper 1 addresses a fundamentally important and timely question about how AI usage affects human skill development, with broad implications across education, workforce policy, and AI governance. Its findings that AI can either complement or substitute for human reasoning, depending on informativeness and usage patterns, are highly relevant to ongoing societal debates about AI integration. Paper 2, while technically solid and useful for LLM inference optimization, addresses a narrower systems-level problem (KV cache management) that is more incremental and has a smaller audience. Paper 1's cross-disciplinary relevance and policy implications give it greater potential impact.
Paper 1 has higher likely scientific impact due to broader cross-domain relevance (education, human-AI interaction, cognition, policy), high timeliness given rapid AI adoption, and clearer causal/experimental framing around how AI assistance affects skill development. Its findings can inform AI tool design and regulation across many settings. Paper 2 is innovative and applied, but is more domain-specific (circular manufacturing IT/CPPS architectures) and design-science evaluations often have narrower citation and adoption outside industrial engineering, despite solid real-world applicability.
Paper 1 addresses a fundamental and timely question about how AI usage affects human skill development, with broad implications across education, workforce training, and AI policy. Its findings—that AI can either complement or substitute for human learning depending on informativeness—have wide applicability and relevance to ongoing societal debates about AI integration. Paper 2, while technically solid, is a more incremental engineering contribution focused on a specific application (spreadsheet automation via RL fine-tuning), with narrower impact scope and less conceptual novelty. Paper 1's insights are more likely to influence multiple fields and policy discussions.