Beyond representational alignment with brain-guided language models for robust reasoning

Mingqing Xiao, Kai Du, Zhouchen Lin

Jun 10, 2026arXiv:2606.11893v1

cs.LGcs.AIcs.CLq-bio.NC

#1040of 5669·cs.LG

#1040 of 5669 · cs.LG

Tournament Score

1471±44

10501750

64%

Win Rate

Wins

Losses

Matches

Rating

7.5/ 10

Significance8

Rigor7.5

Novelty8.5

Clarity7.5

Abstract

The correspondence between large language models (LLMs) and the neural mechanisms underlying human higher-order cognition remains insufficiently characterized. Given that language and reasoning in the human brain appear dissociable, an open question is whether LLMs align with neural signals from reasoning-related regions and whether such signals can improve them. Here, focusing on deductive reasoning, we show that LLM internal representations are not only partially aligned with task-fMRI activity but can also be directly enhanced by these signals. Using a neural-predictivity metric, we find that LLMs explain a substantial fraction of the explainable variance in reasoning-related regions at the aggregate level, whereas predictivity within specific reasoning types is lower, indicating both alignment and divergence. Building on this, we propose a brain-guided framework: we steer model representations along directions induced by the joint structure of model and brain representations, applying intervention at inference and fine-tuning during training. We demonstrate that task-evoked brain signals can directly enhance LLM reasoning, yielding gains orthogonal to language-only supervision across 10 LLMs (1.5B-72B), with transfer across reasoning types and up to 13\% absolute accuracy gain. Our results advance LLM-brain correspondences from correlation to guidance, establishing a brain-signal-driven pathway toward more robust and cognitively aligned AI.

AI Impact Assessments

(1 models)

Scientific Impact Assessment

Core Contribution

This paper makes two interconnected contributions. First, it demonstrates that LLM internal representations are partially aligned with human fMRI activity in deductive reasoning-related brain regions (explaining ~76% of explainable variance at the aggregate level, dropping to ~27% within specific reasoning types). Second, and more significantly, it proposes a brain-guided framework (NARI for inference-time intervention and NARF for fine-tuning) that uses task-evoked fMRI signals to steer LLM representations and improve reasoning performance. The key insight is treating neural data not merely as a validation target for correlation analysis, but as a functional training signal—moving from "correlation to guidance."

The core mechanism is elegant: a ridge regression maps LLM hidden states to fMRI space, and gradients of a similarity objective with respect to model representations yield steering directions jointly induced by both model and brain representational structures. The mathematical derivation (Sec. S5) reveals that these directions depend on the Gram matrices of centered model representations, neural representations, and their cross-structure—providing a principled basis for why neural signals add information beyond what model structure alone provides.

Methodological Rigor

The experimental design is thorough with multiple controls and ablations. Key strengths include:

Proper baselines: Random signals (preserving model structure but eliminating neural structure) and random directions isolate the contribution of neural data.

Ceiling-normalized predictivity: Following established neuroscience methodology for brain scores with proper ceiling estimation.

Controlled stimuli: The fMRI dataset uses pseudowords to minimize semantic confounds—crucial for isolating reasoning from language.

Comprehensive evaluation: Testing across 10 LLMs (1.5B-72B), multiple reasoning types, premise permutations, and premise counts.

Statistical rigor: Paired t-tests, multiple random seeds, m.a.d. error bars.

However, several methodological concerns warrant attention. The fMRI dataset is relatively small (10 subjects after exclusions, 70 problems), raising questions about generalizability of the neural signals. The intervention approach requires hyperparameter tuning (perturbation range α, scale factor γ) that is model-specific, and the authors acknowledge representation steering reliability issues. The 100% success rate for NARI on incorrect problems is achieved with up to 200 optimization steps and subject-aggregation, which somewhat obscures the difficulty of finding effective directions. The extension to the HCP relational processing task (achieving ~80% success) provides important but partial external validation.

The ablation showing that neither model structure alone (random signals) nor the systematic shift alone achieves optimal results—but their combination does—is convincing evidence that the method genuinely leverages neural information.

Potential Impact

This work opens several impactful directions:

1. NeuroAI paradigm shift: Moving from descriptive alignment metrics to prescriptive neural guidance for AI improvement represents a conceptual advance. Previous work used neural signals for vision robustness or speech understanding; this extends to higher-order cognition.

2. Complementary training signals: The demonstration that NARF provides gains orthogonal to language supervision (Fig. 6) suggests neural data could serve as a fundamentally different form of "process supervision"—guiding intermediate representations rather than outputs.

3. Practical implications: Up to 13% absolute accuracy gains on propositional reasoning, transfer across reasoning types, and compatibility with standard training pipelines suggest practical utility, though the requirement for task-matched fMRI data limits immediate scalability.

4. Cognitive science implications: The finding that LLMs align more with reasoning networks than language networks challenges the view that LLMs merely capture linguistic patterns, contributing to the ongoing debate about whether LLMs develop genuine internal representations beyond surface statistics.

Timeliness & Relevance

The paper is exceptionally timely. With frontier LLMs still failing on simple out-of-distribution reasoning tasks, and the field heavily investing in language-based reasoning (chain-of-thought, reinforcement learning), this offers an orthogonal approach. The inclusion of DeepSeek-R1-Distill experiments bridges the gap to current thinking models. The dissociation between language and reasoning in the brain directly motivates why language-only training might be insufficient—a hypothesis gaining traction in the field.

Strengths

Novel conceptual framework: The transition from alignment-as-metric to alignment-as-guidance is the paper's most important contribution.

Comprehensive experimental validation: 10 models, multiple reasoning types, transfer experiments, ablations, two fMRI datasets.

Mathematical clarity: The gradient derivation explicitly shows how model and brain structures jointly determine steering directions.

Practical compatibility: NARF integrates seamlessly with standard LoRA fine-tuning and language supervision.

No catastrophic forgetting: General capabilities remain stable (Table S1).

Limitations

Data scale: 10 subjects, 70 problems is small; the method's effectiveness with richer neural datasets remains speculative.

Task specificity: Basic deductive reasoning with pseudowords is far from the complexity of real-world reasoning tasks.

Temporal limitations of fMRI: The authors acknowledge hemodynamic response limitations for tracking fast reasoning processes.

Steering reliability: Model-specific hyperparameter sensitivity and acknowledged instability of representation steering limit plug-and-play applicability.

Modest absolute gains in combined setting: The 2.2% average gain for NARF+Label over Label alone, while statistically significant, is modest for practical applications.

Causality questions: Whether the improvements truly reflect "cognitive" information versus task-correlated statistical patterns in fMRI remains debatable.

Overall Assessment

This paper represents a creative and well-executed contribution that advances the NeuroAI field from descriptive to prescriptive. While the practical impact is currently constrained by data availability and task complexity, the conceptual contribution—demonstrating that cognitive brain signals can causally improve AI reasoning—is significant and likely to inspire substantial follow-up work across both neuroscience and AI communities.

Rating:7.5/ 10

Significance 8Rigor 7.5Novelty 8.5Clarity 7.5

Generated Jun 11, 2026

Comparison History (22)

Lostvs. MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling

MaxProof demonstrates unprecedented performance on elite mathematical competitions (IMO 2025, USAMO 2026), exceeding human gold-medal thresholds. This represents a landmark achievement in AI mathematical reasoning with immediate, verifiable real-world impact. The framework combining generative-verifier RL with population-level test-time scaling introduces practical innovations with broad applicability. While Paper 1 presents an interesting neuroscience-AI bridge concept, its gains (up to 13% accuracy) are more incremental, and the brain-guided approach faces scalability limitations. Paper 2's results are more transformative for the field and will likely attract significantly more attention and follow-up work.

claude-opus-4-6·Jun 12, 2026

Lostvs. The Geometry of Phase Transitions in Generative Dynamics via Projection Caustics

Paper 1 provides a novel geometric framework (projection caustics) explaining phase transitions in diffusion models, connecting differential geometry to generative AI dynamics. It offers both theoretical insight and practical tools (CBD). While Paper 2 is innovative in using brain signals to guide LLMs, its reliance on fMRI data limits scalability and practical adoption. Paper 1's theoretical contributions have broader implications for understanding and controlling the rapidly growing class of diffusion/flow-matching models, and its geometric perspective could influence multiple subfields of generative modeling and optimization.

claude-opus-4-6·Jun 12, 2026

Lostvs. Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning

Paper 2 has higher likely scientific impact due to a broadly applicable, technically clean innovation (discrete boundary tokens enabling RL-compatible latent recurrence) that addresses a key bottleneck in training and interpretability of latent reasoning. It offers clear methodological rigor (on-policy RL objective with well-defined ratios, curriculum, mechanistic/causal analyses) and should transfer across tasks and model families, influencing RLHF-style training, efficiency, and interpretability. Paper 1 is novel and timely but depends on costly, narrow fMRI data and may face scalability/generalization constraints, limiting widespread adoption.

gpt-5.2·Jun 12, 2026

Lostvs. Learning with Simulators: No Regret in a Computationally Bounded World

Paper 2 introduces a fundamentally new theoretical framework (simulatable processes) that broadens the foundational PAC learning model, addressing a long-standing gap in learning theory regarding dependent data. It provides novel connections between computational complexity (time-bounded Kolmogorov complexity) and learnability, with strict separation results. This has broad implications across learning theory, computational complexity, and practical ML. Paper 1 is innovative in using brain signals to guide LLMs, but is more incremental—combining existing neuroimaging with LLM fine-tuning—and its practical scalability (requiring fMRI data) limits broader impact.

claude-opus-4-6·Jun 12, 2026

Wonvs. Understanding and Accelerating the Training of Masked Diffusion Language Models

Paper 2 bridges AI and cognitive neuroscience by directly utilizing human brain fMRI signals to enhance LLM reasoning. This interdisciplinary approach is highly novel and opens a new paradigm for cognitively aligned AI, potentially impacting multiple fields broadly. In contrast, Paper 1, while valuable for accelerating masked diffusion model training, offers a more specialized algorithmic optimization confined to NLP.

gemini-3.1-pro-preview·Jun 11, 2026

Lostvs. Anatomy of Post-Training: Using Interpretability to Characterize Data and Shape the Learning Signal

Paper 1 addresses a fundamental and widely-relevant problem in LLM post-training—making the learning signal interpretable and controllable—with broad practical applications across the entire LLM development ecosystem. It unifies multiple training protocols under a principled interpretability framework, offering immediate utility to practitioners. Paper 2 is innovative in using brain signals to guide LLM reasoning, but its impact is narrower: it requires fMRI data, addresses a more niche intersection of neuroscience and AI, and the practical scalability of brain-guided approaches remains limited compared to the broadly applicable data-centric pipeline of Paper 1.

claude-opus-4-6·Jun 11, 2026

Wonvs. nD-RoPE: A Generalized RoPE for n-Dimensional Position Embedding

Paper 1 bridges neuroscience and AI by moving beyond mere correlation to actively using human fMRI brain signals to improve LLM reasoning. This cross-disciplinary approach is highly novel and paradigm-shifting. While Paper 2 offers a rigorous and broadly applicable architectural improvement for multimodal Transformers, Paper 1's method of integrating biological cognitive mechanisms directly into model training addresses a fundamental bottleneck in AI reasoning, offering potentially broader conceptual impact and establishing a new pathway for cognitively aligned AI.

gemini-3.1-pro-preview·Jun 11, 2026

Wonvs. Multimodal Ordinal Modeling of Alzheimer's Disease Severity Using Structural MRI and Clinical Data

Paper 1 presents a highly novel paradigm shift—moving from correlational analysis of LLM-brain alignment to using brain signals to actively guide and improve LLM reasoning. This bridges neuroscience and AI in an unprecedented way, demonstrating gains across 10 models of varying scale with transfer across reasoning types. The breadth of impact spans cognitive science, neuroscience, and AI. Paper 2, while methodologically sound, applies relatively incremental improvements (ordinal regression, multimodal fusion) to AD staging, a well-studied problem, with modest performance metrics and narrower scope.

claude-opus-4-6·Jun 11, 2026

Wonvs. Unifying Local Communications and Local Updates for LLM Pretraining

Paper 1 presents a novel and highly interdisciplinary framework that bridges neuroscience and AI by using brain fMRI signals to directly enhance LLM reasoning, moving beyond correlational analysis to causal guidance. This represents a fundamentally new paradigm (brain-guided AI) with broad implications across cognitive science, neuroscience, and AI alignment. The demonstrated improvements across 10 LLMs of varying scales and transfer across reasoning types strengthen its impact. Paper 2, while practically valuable for distributed LLM training efficiency, is more incremental—optimizing existing decentralized training paradigms rather than opening a new research direction.

claude-opus-4-6·Jun 11, 2026

Wonvs. Efficient Time Series Clustering from Multiscale Reservoir Dynamics with Granular-Ball Anchoring Graph Optimization

Paper 2 presents a novel brain-guided framework that moves beyond correlational analyses of LLM-brain correspondence to actually using brain signals (fMRI) to enhance LLM reasoning capabilities. This represents a paradigm shift from passive alignment studies to active neural guidance, with demonstrated improvements across 10 LLMs of varying scales. The interdisciplinary impact spans neuroscience, AI, and cognitive science. Paper 1, while technically sound, addresses a more incremental improvement in time-series clustering efficiency. Paper 2's novelty, broad applicability, and potential to reshape how neuroscience informs AI development give it substantially higher impact potential.

claude-opus-4-6·Jun 11, 2026

#1040of 5669·cs.LG

#1040 of 5669 · cs.LG

Tournament Score

1471±44

10501750

64%

Win Rate

Wins

Losses

Matches

Rating

7.5/ 10

Significance8

Rigor7.5

Novelty8.5

Clarity7.5