How do Humans Process AI-generated Hallucination Contents: a Neuroimaging Study

Shuqi Zhu, Yi Zhong, Ziyi Ye, Bangde Du, Yujia Zhou, Qingyao Ai, Yiqun Liu

May 16, 2026

arXiv:2605.16953v1 PDF

cs.AI(primary)cs.CL

#748of 2292·Artificial Intelligence

#748 of 2292 · Artificial Intelligence

Tournament Score

1448±44

10501800

63%

Win Rate

Wins

Losses

Matches

Rating

5.5/ 10

Significance6

Rigor5

Novelty6.5

Clarity6.5

Tournament Score

1448±44

10501800

63%

Win Rate

Wins

Losses

Matches

Rating

5.5/ 10

Significance

Rigor

Novelty

Clarity

Abstract

While AI-generated hallucinations pose considerable risks, the underlying cognitive mechanisms by which humans can successfully recognize or be misled by these hallucinations remain unclear. To address this problem, this paper explores humans' neural dynamics to characterize how the brain processes hallucinated content. We record EEG signals from 27 participants while they are performing a verification task to judge the correctness of image descriptions generated by a multi-modal large language model (MLLM). Based on an averaged event-related potential (ERP) study, we reveal that multiple cognitive processes, e.g., semantic integration, inferential processing, memory retrieval, and cognitive load, exhibit distinct patterns when humans process hallucinated versus non-hallucinated content. Notably, neural responses to hallucinations that were misjudged versus correctly judged by human participants showed significant differences. This indicates that misjudged AI-generated hallucinations failed to trigger the standard neurocognitive fact verification pathway.

AI Impact Assessments

(1 models)

Scientific Impact Assessment

Core Contribution

This paper investigates the neural mechanisms underlying human processing of AI-generated hallucinations using EEG/ERP methodology. The study records brain signals from 27 participants as they evaluate whether image descriptions generated by a multimodal LLM (Qwen2.5-VL-3B-Instruct) match presented images. The core novelty lies in bridging neuroscience and AI hallucination research — moving beyond behavioral studies to examine *when* and *how* the brain detects (or fails to detect) hallucinated content at millisecond resolution. The key finding is that correctly identified hallucinations elicit enhanced ERP components across multiple processing stages (N100, P200, N400, P600), while hallucinations that fooled participants show no significant neural differences from non-hallucinated content.

Methodological Rigor

The experimental design is competent but has notable limitations. The use of 27 participants with a controlled stimulus set (60 image-response pairs, 120 sentences total) follows standard EEG practices, and the power analysis confirms adequate sensitivity for medium-to-large effects. The preprocessing pipeline (re-referencing, filtering, epoching) and the GFP-based time window identification follow established conventions.

However, several methodological concerns arise:

1. Stimulus presentation: Words are presented at 750ms each, which is substantially slower than natural reading (~250ms/word). This artificial pace may alter cognitive processing dynamics, limiting ecological validity.

2. Condition imbalance: The HalluWrong condition naturally has far fewer trials than HalluCorrect (participants were ~84% accurate), creating uneven statistical power across the critical comparisons. The paper does not adequately address how trial count differences affect ERP averaging quality.

3. Multiple comparisons: While FDR correction is applied to post-hoc tests, the paper tests 7 brain regions × 4 time windows × 3 pairwise comparisons, creating a substantial multiple testing burden. Some reported effects are marginal after correction.

4. Circularity concern in prediction experiments: The feature selection (choosing ROIs based on significant ERP effects) was done on the same dataset used for prediction, potentially inflating classification performance. The authors address this partially in the appendix but the concern remains.

5. The null finding for HalluWrong vs. NoHallu is interesting but interpreting null results requires caution — absence of evidence is not evidence of absence, particularly given the reduced trial counts in the HalluWrong condition.

Potential Impact

The paper has interdisciplinary appeal, sitting at the intersection of AI safety, cognitive neuroscience, and human-computer interaction. Several impact pathways exist:

AI system design: The finding that hallucination detection involves multiple cognitive stages (attention → semantic integration → memory retrieval → reanalysis) could inform multi-stage hallucination detection architectures.

Human-AI interaction: The observation that undetected hallucinations fail to trigger any anomalous neural response is practically important — it suggests that fluent hallucinations bypass cognitive safeguards entirely rather than being subconsciously registered but behaviorally ignored.

EEG-based implicit feedback: The prediction experiments (AUC ~0.93-0.98 for HalluCorrect vs. NoHallu) suggest potential for neural-signal-based content verification systems, though the requirement for correct recognition limits practical utility.

However, the practical applicability is constrained by the laboratory setting, EEG equipment requirements, and the fundamental limitation that EEG-based detection only works when humans already correctly identify hallucinations.

Timeliness & Relevance

The paper addresses a highly timely topic. AI hallucination is one of the most pressing challenges in LLM deployment, and understanding human vulnerability to hallucinations is critical for safe AI deployment. The neuroscience perspective is genuinely underexplored in this space, making this a relevant contribution. The publication at ICML 2026 is appropriate given the growing interest in human factors within the ML community.

Strengths

1. Novel research question: This is among the first studies to examine AI hallucination processing through neuroimaging, creating a new research direction.

2. Interesting asymmetry finding: The distinction between correctly and incorrectly judged hallucinations provides genuine insight — that undetected hallucinations produce neural signatures indistinguishable from truthful content is a substantive finding with implications for AI safety.

3. Multi-level analysis: Combining ERP analysis with prediction experiments provides complementary evidence.

4. Open data and code: The commitment to releasing the EEG dataset is valuable for reproducibility and community building.

Limitations

1. Limited novelty in ERP methodology: The ERP findings largely recapitulate known effects (N400 for semantic violation, P600 for reanalysis) applied to a new stimulus type. The interpretive framework doesn't substantially advance neurocognitive theory.

2. Confound between task demands and hallucination processing: Participants were explicitly instructed to judge correctness. The enhanced ERPs for correctly identified hallucinations may simply reflect successful task engagement rather than hallucination-specific processing. A passive viewing condition would help disambiguate this.

3. Small and homogeneous stimulus set: Only 60 image-response pairs from a single MLLM, with relatively shallow hallucination types (entity, attribute, relation). Generalizability to diverse hallucination patterns and models is uncertain.

4. The prediction experiment's practical value is limited: If EEG can only detect hallucinations that humans already correctly identify, the added value over simple behavioral responses is unclear.

5. No modeling of individual differences: Given variation in accuracy (85-112 correct items), examining what predicts susceptibility would strengthen the contribution.

6. Comparison with AI-based methods (Table 10): Comparing averaged EEG from 27 participants against automated methods is not a fair comparison and the claimed superiority is misleading.

Overall Assessment

This paper opens an interesting and timely research direction by applying neuroscience methods to understand human processing of AI hallucinations. The core finding — that undetected hallucinations produce no distinguishable neural signature — is the most impactful result, with clear implications for AI safety. However, the methodological execution has notable limitations, the ERP findings are somewhat predictable given existing literature, and the practical implications remain constrained. It represents a solid first step in an important direction rather than a definitive contribution.

Rating:5.5/ 10

Significance 6Rigor 5Novelty 6.5Clarity 6.5

Generated May 19, 2026

Comparison History (19)

vs. Can AI Make Conflicts Worse? An Alignment Failure in LLM Deployment Across Conflict Contexts

claude-opus-4.65/22/2026

Paper 2 offers greater scientific impact due to its novel interdisciplinary approach combining neuroscience and AI, revealing fundamental cognitive mechanisms underlying human processing of AI hallucinations. This opens new research directions at the intersection of cognitive science, HCI, and AI safety. While Paper 1 addresses an important applied problem (AI in conflict contexts) and provides a useful evaluation framework, it is more domain-specific. Paper 2's findings about neural pathways for fact verification have broader implications for AI system design, human-AI interaction, and understanding cognitive vulnerabilities, with potential applications across many fields.

vs. Beyond Mode Collapse: Distribution Matching for Diverse Reasoning

gpt-5.25/20/2026

Paper 1 offers a broadly applicable methodological advance for on-policy RL (forward-KL-inspired distribution matching to reduce mode collapse) with demonstrated gains across combinatorial optimization, math reasoning, and modalities, suggesting strong downstream impact on training LLM/RL reasoning systems. It is timely for RLHF-style optimization and has clear real-world applications. Paper 2 is novel and relevant but has narrower scope and limited sample size (n=27) typical of EEG studies, making generalization and immediate translational impact less certain. Overall, Paper 1 likely yields wider cross-field adoption.

vs. How Far Are We From True Auto-Research?

gemini-3.15/20/2026

Paper 1 evaluates a highly critical and timely topic (AI-generated scientific research), exposing major flaws in current evaluation methods. Its introduction of artifact-aware peer review and analysis of failure modes will broadly impact how AI research agents are developed and assessed. Paper 2, while offering novel neuroscientific insights into AI hallucinations, has a narrower scope and smaller sample size (N=27), limiting its broader methodological impact compared to the systemic benchmark established in Paper 1.

vs. Latent Heuristic Search: Continuous Optimization for Automated Algorithm Design

gemini-3.15/19/2026

Paper 2 pioneers a highly interdisciplinary intersection of cognitive neuroscience and AI safety. By investigating the neural dynamics of how humans process AI hallucinations, it addresses a critical socio-technical issue with broad implications for HCI, cognitive psychology, and AI alignment. While Paper 1 offers a strong methodological advance in automated algorithm design, Paper 2's novel empirical approach to understanding human vulnerability to AI hallucinations provides wider potential scientific impact across multiple disciplines.

vs. Voices in the Loop: Mapping Participatory AI

gemini-3.15/19/2026

Paper 1 demonstrates higher scientific impact due to its high novelty and cross-disciplinary innovation. By intersecting cognitive neuroscience with AI safety, it provides foundational empirical insights into a critical, timely problem: human susceptibility to AI hallucinations. Uncovering the neural mechanisms of how humans process and are misled by AI-generated content opens new avenues for human-computer interaction, cognitive science, and AI alignment. In contrast, while Paper 2 provides a valuable repository for policy and ethics, it is fundamentally a mapping exercise and lacks the empirical breakthrough potential of Paper 1's neuro-cognitive discoveries.

vs. BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE

claude-opus-4.65/19/2026

Paper 2 has higher potential scientific impact due to its novelty in bridging neuroscience and AI safety, opening an entirely new research direction on how humans cognitively process AI hallucinations. This interdisciplinary work has broad implications for AI safety, human-computer interaction, cognitive science, and policy-making around AI deployment. Paper 1, while practically useful, represents an incremental engineering improvement to MoE routing efficiency. Paper 2's findings about distinct neural pathways for misjudged vs. correctly identified hallucinations provide fundamental insights that could influence multiple fields.

vs. It's not the Language Model, it's the Tool: Deterministic Mediation for Scientific Workflows

claude-opus-4.65/19/2026

Paper 2 opens a novel interdisciplinary research direction at the intersection of neuroscience and AI safety, providing foundational insights into how the human brain processes AI hallucinations. This has broad implications for AI interface design, cognitive science, and AI safety policy. Paper 1, while practically valuable for reproducible scientific workflows, presents more of an engineering pattern (typed mediation) with narrower scope. Paper 2's novelty in applying neuroimaging to understand human-AI interaction is more likely to inspire follow-up research across multiple fields.

vs. AcuityBench: Evaluating Clinical Acuity Identification and Uncertainty Alignment

gemini-3.15/19/2026

Paper 1 addresses a critical, high-stakes gap in medical AI safety by introducing a unified benchmark for clinical triage. Its direct real-world implications for patient safety and the deployment of healthcare LLMs offer broader and more immediate practical impact than the exploratory neuroimaging study in Paper 2, despite Paper 2's high novelty in cognitive science.

vs. When Context Hurts: The Crossover Effect of Knowledge Transfer on Multi-Agent Design Exploration

gemini-3.15/19/2026

Paper 2 challenges a fundamental assumption in modern AI engineering (that more context improves agent performance) with a robust empirical study of over 2,700 runs. Its finding that context can actively degrade performance, alongside a highly predictive diagnostic metric, has immediate and widespread implications for the design of multi-agent systems and Retrieval-Augmented Generation (RAG). While Paper 1 is highly novel in bridging neuroscience and AI safety, Paper 2 offers greater methodological scale and broader, more actionable real-world impact across the rapidly growing field of LLM deployment.

vs. MADP: A Multi-Agent Pipeline for Sustainable Document Processing with Human-in-the-Loop

claude-opus-4.65/19/2026

Paper 2 investigates a fundamentally novel question—how the human brain processes AI-generated hallucinations—using neuroimaging (EEG/ERP), bridging neuroscience and AI safety. This interdisciplinary contribution opens new research directions in understanding human-AI interaction at a cognitive level, with broad implications for AI safety, interface design, and cognitive science. Paper 1, while practically useful, is primarily an engineering contribution describing a document processing pipeline with incremental improvements, limited novelty beyond system integration, and evaluation on a narrow use case.

vs. A Global-Local Graph Attention Network for Traffic Forecasting

gemini-3.15/19/2026

Paper 1 addresses a highly timely and critical issue—AI hallucinations—through an innovative, interdisciplinary approach combining neuroscience, cognitive science, and human-computer interaction. Understanding the neural mechanisms of how humans process AI hallucinations offers profound implications for AI safety, trust, and human-AI alignment. In contrast, Paper 2, while methodologically sound, presents a more incremental algorithmic advancement in graph neural networks for a well-established application (traffic forecasting). Paper 1's broader novelty and cross-disciplinary relevance give it a significantly higher potential for widespread scientific impact.

vs. Ensemble Monitoring for AI Control: Diverse Signals Outweigh More Compute

claude-opus-4.65/19/2026

Paper 1 is more novel and interdisciplinary, pioneering the neuroscientific investigation of how humans process AI-generated hallucinations using EEG/ERP methodology. It opens a new research direction at the intersection of cognitive neuroscience and AI safety, with broad implications for understanding human-AI interaction, designing better AI systems, and informing AI literacy. Paper 2, while practically useful, applies well-established ensemble methods to AI monitoring—an incremental engineering contribution. Paper 1's unique cross-disciplinary approach and fundamental insights into human cognition give it greater potential for broad scientific impact.

vs. A Conflict-aware Evidential Framework for Reliable Sleep Stage Classification

gpt-5.25/19/2026

Paper 1 is more novel and timely, addressing the high-impact societal problem of AI hallucinations with neuroimaging evidence about human verification pathways. Its findings could influence HCI, cognitive neuroscience, AI safety, and evaluation/training of multimodal LLMs, giving broader cross-field impact than a domain-specific sleep staging method. While Paper 2 is methodologically solid and application-ready for clinical sleep analysis, conflict-aware multi-view aggregation is a more incremental ML contribution with narrower scope. Paper 1’s mechanistic insight into when humans are misled may generalize widely and shape future mitigation strategies.

vs. KISS - Knowledge Infrastructure for Scientific Simulation: A Scaffolding for Agentic Earth Science

claude-opus-4.65/19/2026

Paper 1 demonstrates higher potential scientific impact due to its broad applicability across 14 Earth-science domains with 119 knowledge infrastructures, its novel framework for democratizing complex process-based simulation models, and its rigorous 3,000-trial benchmark. It addresses a critical real-world need (climate risk accessibility) with a generalizable, scalable solution. Paper 2, while innovative in applying neuroimaging to AI hallucination detection, has a smaller sample size (27 participants), narrower scope, and more incremental contribution to understanding human-AI interaction without offering a transformative solution.

vs. ScreenSearch: Uncertainty-Aware OS Exploration

gemini-3.15/19/2026

Paper 2 bridges artificial intelligence, cognitive neuroscience, and human-computer interaction to address the highly timely issue of AI hallucinations. By revealing the neural mechanisms behind human susceptibility to these errors, it offers broad, cross-disciplinary implications for AI safety, interface design, and cognitive science. In contrast, Paper 1 presents a solid but more specialized technical advancement in the narrower subfield of GUI agent navigation and state exploration.

vs. TopoEvo: A Topology-Aware Self-Evolving Multi-Agent Framework for Root Cause Analysis in Microservices

claude-opus-4.65/19/2026

Paper 2 has higher potential scientific impact due to its broader interdisciplinary relevance spanning neuroscience, cognitive science, and AI safety. It addresses a fundamental question about human-AI interaction—how the brain processes AI hallucinations—which is timely and relevant across multiple fields. The neuroimaging approach provides novel mechanistic insights into why humans are misled by AI, with implications for AI system design, trust calibration, and AI safety policy. Paper 1, while technically sophisticated, addresses a narrower DevOps/microservices problem with incremental advances over existing RCA methods.

vs. CAM-Bench: A Benchmark for Computational and Applied Mathematics in Lean

gemini-3.15/19/2026

Paper 1 pioneers a novel intersection of neuroscience and AI safety, investigating the cognitive mechanisms behind human susceptibility to AI hallucinations. This interdisciplinary approach offers profound implications for cognitive science, human-computer interaction, and mitigating AI risks in real-world applications. Paper 2 provides a valuable but more niche benchmark for formal theorem proving in applied mathematics. Due to its broader relevance, urgency regarding AI safety, and innovative methodology, Paper 1 has a higher potential for widespread scientific impact across multiple fields.

vs. CBT-Audio: Evaluating Audio Language Models for Patient-Side Distress Intensity Estimation in CBT Session Recordings

gpt-5.25/19/2026

Paper 1 is likely to have higher impact because it releases a scarce, ethically difficult-to-obtain evaluation dataset for spoken CBT, enabling reproducible benchmarking of audio language models in mental health—an area with clear real-world application. It tests multiple models and input modalities with expert-validated labels, offering actionable evidence that audio adds value beyond transcripts. Paper 2 is novel and timely but has narrower generalizability (EEG, n=27, specific MLLM task) and yields mainly descriptive neurocognitive findings, which may translate less directly into broadly adopted methods or resources.

vs. The Capability Paradox: How Smarter Auditors Make Multi-Agent Systems Less Secure

gemini-3.15/19/2026

Paper 1 addresses a highly pressing issue in AI safety with a counter-intuitive finding (the capability paradox) and a large-scale empirical evaluation (over 40,000 trials). Its proposed mitigation provides immediate practical value for designing multi-agent LLMs. While Paper 2 offers an interesting interdisciplinary approach, its small sample size (27 participants) and focus on neural correlates offer less immediate, broad impact on the rapidly evolving landscape of AI development compared to Paper 1.