Flow Matching with In-Context Priors for Out-of-Distribution Brain Dynamics

Sam Gijsen, Michał Łukomski, Marc-André Schulz, Kerstin Ritter

Jun 10, 2026arXiv:2606.11833v1

cs.LGq-bio.NC

#1472of 5669·cs.LG

#1472 of 5669 · cs.LG

Tournament Score

1453±44

10501750

64%

Win Rate

Wins

Losses

Matches

Rating

7/ 10

Significance7.5

Rigor7.5

Novelty8

Clarity8

Abstract

Flow matching and diffusion models enable conditional generation across domains ranging from images to proteins, with recent extensions to out-of-distribution contexts. Yet generative models of neural time series have largely remained restricted to categorical conditioning, precluding compositional and zero-shot generalization. In this work, we propose a per-timestep conditioned diffusion transformer for generating realistic fMRI brain dynamics during unseen cognitive tasks by injecting both compositional language and optional spatial priors in-context. Such zero-shot generation could enable counterfactual neuroscience by supporting in-silico design and evaluation of novel cognitive experiments before empirical validation. Leveraging this model, we evaluate across hundreds of held-out task conditions and characterize predictive performance in relation to the training manifold. From language alone, the model recovers region-specific recruitment across tasks and held-out spatial activation patterns. Spatial priors, when available, complement the text pathway by anchoring generation in regions of task space where language alone degrades, while retaining the compositional structure needed for counterfactual task specification. To our knowledge this is the first generative model of whole-cortex fMRI dynamics for unseen cognitive tasks, advancing counterfactual neuroscience and data-driven experimental design.

AI Impact Assessments

(1 models)

Scientific Impact Assessment

1. Core Contribution

This paper introduces a per-timestep conditioned diffusion transformer trained with flow matching to generate realistic whole-cortex fMRI dynamics for cognitive tasks never seen during training. The key novelty is twofold: (1) replacing categorical task conditioning (used by all prior fMRI generative models) with compositional language embeddings that decompose each timepoint's cognitive content into sensory, instruction, and response components, and (2) a self-supervised spatial prior pathway derived from GLM coefficients that anchors generation when language conditioning becomes unreliable far from the training manifold. The paper frames this as enabling "counterfactual neuroscience" — the ability to design and evaluate novel cognitive experiments in silico before running them in a scanner.

The problem addressed is genuine and well-motivated: task-fMRI acquisition is expensive, the space of possible cognitive experiments is combinatorial, and no existing generative model can produce realistic brain dynamics for unseen experimental designs. Encoding models give deterministic point estimates without temporal dynamics; categorical diffusion models cannot generalize beyond their label sets.

2. Methodological Rigor

The experimental design is notably thorough. Seven task-held-out folds ensure every task is evaluated as out-of-distribution, with the model retrained from scratch for each fold. The evaluation spans hundreds of held-out conditions across HCP (7 tasks) and IBC (53 tasks), which is unusually comprehensive for neuroimaging generative modeling.

The evaluation methodology is carefully designed. Rather than evaluating generated activation maps directly, the authors fit standard GLMs to the generated time series and compare recovered contrast maps with those from real data. This is a stringent test: it requires the model to produce temporally structured, HRF-consistent dynamics rather than static spatial patterns. The leakage-controlled context windows and matched real/synthetic GLM fitting add rigor.

Several design decisions are well-justified: the self-supervised spatial prior exploits known HRF physiology, the counterfactual training with spliced sequences addresses overfitting to condition orderings, and the compositional language token structure (sensory/instruction/response) naturally decomposes experimental design axes.

However, some concerns exist. The oracle spatial prior setting (fspat with empirical priors from held-out data) achieves the highest performance but is acknowledged as unrealistic for true zero-shot generation. The predicted-prior variant using the Direct baseline actually performs slightly worse than ftxt in several metrics (Table 1), somewhat undermining the practical utility of the spatial prior pathway when oracle data is unavailable. The evaluation is also primarily focused on spatial activation patterns via GLM contrasts; while temporal statistics are assessed (power spectrum, autocorrelation), the evaluation of temporal dynamics fidelity could be more extensive.

3. Potential Impact

Neuroscience applications: The most compelling use case is in silico experimental design — researchers could prototype cognitive experiments computationally before committing scanner time, potentially reducing the cost and accelerating the cycle of hypothesis generation in cognitive neuroscience. The counterfactual intervention capability (modifying event timing, composition) demonstrated in Figure 6 is particularly interesting, showing HRF-consistent responses and repetition suppression effects.

Data augmentation: Synthetic fMRI data could address sample size limitations in neuroimaging, though the paper doesn't extensively evaluate downstream utility of synthetic data for training other models.

Methodological influence: The per-timestep conditioning approach and the dual language/spatial-prior scheme could influence generative modeling in other neuroscience modalities (EEG, MEG) and potentially other biological time series where experimental conditions are compositional.

Limitations on impact: The model currently operates on parcellated data (400 regions), not voxel-level, limiting spatial resolution. Performance degrades with distance from the training manifold (Figure 4b), and the training data covers a finite region of cognitive task space. The practical utility for truly novel cognitive paradigms (far from existing tasks) remains unclear.

4. Timeliness & Relevance

This work sits at the intersection of several active trends: foundation models for neuroscience, diffusion/flow matching models for scientific data, and the push toward in silico experimentation in biology. The paper directly addresses the gap between (a) successful conditional generation in protein/molecular design with out-of-distribution generalization, and (b) neuroimaging generative models limited to categorical conditioning. The framing around counterfactual neuroscience is timely given growing interest in computational experimental design and the reproducibility crisis in neuroimaging (small sample sizes, expensive acquisitions).

5. Strengths & Limitations

Key Strengths:

First model to generate whole-cortex fMRI dynamics for unseen cognitive tasks — a genuinely novel capability

Comprehensive evaluation across hundreds of held-out conditions with proper cross-validation

Principled dual-pathway design with clear theoretical motivation (compositional flexibility vs. distributional anchoring)

Self-supervised spatial prior exploits known neurovascular physiology

Counterfactual temporal interventions demonstrate learned temporal dynamics beyond static spatial associations

Detailed characterization of performance degradation with distance from training manifold (Figure 4b), providing honest assessment of model limitations

Code and pretrained models promised for release

Notable Limitations:

The predicted spatial prior pathway underperforms text-only generation, limiting practical utility when oracle priors are unavailable

Language conditioning underspecifies fine-grained perceptual content, acknowledged by authors

Evaluation is primarily at the contrast-map level; downstream utility of generated data (e.g., for training decoders, testing analysis pipelines) is not evaluated

Subject-specific generation relies on context volumes, and the fingerprinting analysis (Figure 5) shows modest accuracy

Large standard deviations across tasks suggest highly variable performance

The model generates 16-second windows tiled together, potentially introducing artifacts at boundaries

Limited to parcellated representations, precluding voxel-level applications

6. Additional Observations

The paper is well-written with clear exposition. The appendices are thorough, including complete contrast definitions and ablation details. The compute requirements are reasonable (~672 GPU hours total), suggesting good reproducibility. The comparison with the Direct baseline is useful but somewhat limited — the lack of prior generative models for this specific task makes benchmarking inherently difficult.

The correlation values for text-only OOD generation (r=0.60 for HCP single conditions, r=0.47 for IBC contrasts) represent meaningful signal but leave substantial room for improvement, particularly for the more challenging within-task contrasts where subtle cognitive differences must be captured.

Rating:7/ 10

Significance 7.5Rigor 7.5Novelty 8Clarity 8

Generated Jun 11, 2026

Comparison History (22)

Wonvs. VideoMDM: Towards 3D Human Motion Generation From 2D Supervision

Paper 1 introduces a fundamentally novel paradigm—generative flow matching for whole-cortex fMRI dynamics conditioned on compositional language, enabling zero-shot generation for unseen cognitive tasks. This opens a new direction ('counterfactual neuroscience') with broad implications for experimental design in neuroscience. Its cross-disciplinary impact (generative AI + neuroscience), methodological novelty (in-context priors for neural time series), and potential to transform how cognitive experiments are designed give it higher impact potential. Paper 2, while technically solid, is an incremental improvement in 3D motion generation with a narrower scope focused on reducing supervision requirements.

claude-opus-4-6·Jun 12, 2026

Wonvs. A2D2: Fine-Tuning Any-Length Discrete Diffusion for Adaptive Decoding

Paper 2 likely has higher scientific impact due to its strong real-world application potential and cross-field reach: zero-shot, compositional generation of whole-cortex fMRI dynamics conditioned on language and spatial priors could directly affect neuroscience, cognitive science, and experimental design (counterfactual/in-silico experiments). It is timely given rapid adoption of diffusion/flow methods and foundation-model-style conditioning. Paper 1 is methodologically rigorous and novel within discrete diffusion fine-tuning, but its impact is more concentrated within ML sequence generation, with less immediate domain-transformative application.

gpt-5.2·Jun 12, 2026

Wonvs. Loss-Shift Transfer via Bayes Quotients

Paper 1 introduces a novel generative framework for zero-shot synthesis of whole-cortex fMRI dynamics using flow matching with compositional language and spatial priors—a first-of-its-kind contribution enabling counterfactual neuroscience and in-silico experimental design. Its interdisciplinary impact spans deep generative modeling, cognitive neuroscience, and experimental methodology. Paper 2 formalizes an interesting theoretical concept (loss shift via Bayes quotients), but its scope is narrower, primarily refining transfer learning theory. While rigorous, its practical implications are more incremental compared to Paper 1's paradigm-shifting potential for neuroscience research.

claude-opus-4-6·Jun 12, 2026

Lostvs. Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders

While Paper 1 presents a highly innovative application of diffusion models to neuroscience, Paper 2 addresses a fundamental, timely bottleneck in AI mechanistic interpretability. By analyzing seed dependence and feature stability in Sparse Autoencoders (SAEs), Paper 2 provides crucial theoretical and empirical insights for understanding neural network representations. This foundational work in AI safety and interpretability will likely have a broader and more immediate impact on how researchers analyze and align large language models, making its methodological contributions highly influential for the rapidly moving AI research community.

gemini-3.1-pro-preview·Jun 11, 2026

Lostvs. AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis

Paper 2 has higher likely scientific impact due to its broad applicability and timeliness: a statically-checked, agent-driven framework for synthesizing correct CUDA megakernels for LLM inference across GPU generations. It offers concrete, reproducible systems contributions (validator with extensive adversarial testing, cross-arch retargeting, correctness vs HuggingFace outputs, open-source release) and immediate real-world deployment relevance for inference efficiency. Paper 1 is novel within computational neuroscience, but its impact is narrower (fMRI/task generalization) and depends heavily on dataset/validation constraints and downstream adoption.

gpt-5.2·Jun 11, 2026

Wonvs. Variational Entropic Optimal Transport

Paper 2 pioneers a highly innovative application of generative models to computational neuroscience, enabling zero-shot generation of brain dynamics for unseen tasks. This introduces a novel paradigm of 'counterfactual neuroscience' with direct real-world applications in in-silico experimental design. While Paper 1 provides a strong methodological contribution to optimal transport, Paper 2's interdisciplinary novelty and potential to profoundly alter empirical validation pipelines in cognitive neuroscience give it a higher potential for transformative scientific impact.

gemini-3.1-pro-preview·Jun 11, 2026

Wonvs. Latent World Recovery for Multimodal Learning with Missing Modalities

Paper 1 introduces a novel generative framework (flow matching diffusion transformer) for zero-shot generation of whole-cortex fMRI dynamics conditioned on language descriptions of unseen cognitive tasks—a first-of-its-kind contribution enabling counterfactual neuroscience and in-silico experimental design. This opens fundamentally new research directions in computational neuroscience. Paper 2 addresses the important but more incremental problem of missing modalities in multimodal learning with a technically sound but less paradigm-shifting contribution. Paper 1's novelty, cross-disciplinary impact (AI + neuroscience), and potential to transform experimental design give it higher impact potential.

claude-opus-4-6·Jun 11, 2026

Wonvs. What Uncertainties Do We Need for Dynamical Systems?

Paper 1 is a novel, technically specific method (per-timestep conditioned diffusion transformer with in-context language/spatial priors) enabling zero-shot generation of whole-cortex fMRI dynamics for unseen tasks, with clear, high-value applications (counterfactual neuroscience, in-silico experiment design) and broad relevance to generative modeling and neuroimaging. It also reports large-scale empirical evaluation across many held-out conditions, indicating stronger methodological rigor. Paper 2 is timely and potentially broadly useful as a conceptual framework, but as a discussion-oriented piece it is less likely to yield immediate downstream methods or measurable impact than Paper 1’s concrete capability.

gpt-5.2·Jun 11, 2026

Wonvs. APPO: Agentic Procedural Policy Optimization

Paper 2 has higher potential scientific impact due to greater cross-field novelty (diffusion/flow matching + in-context language/spatial priors applied to whole-cortex fMRI dynamics), a compelling real-world application (counterfactual neuroscience and in-silico experimental design), and broader downstream relevance spanning ML, cognitive neuroscience, and experimental planning. It targets an important gap—zero-shot, compositional conditioning for neural time series—claiming a first-of-its-kind capability. Paper 1 is a solid, timely methodological advance for agentic RL/LLM tool use, but its impact is likely more incremental and concentrated within the LLM-agent training community.

gpt-5.2·Jun 11, 2026

Lostvs. Generalization Hacking: Models Can Game Reinforcement Learning by Preventing Behavioral Generalization

Paper 2 demonstrates a novel and critical AI safety finding—that models can actively resist reinforcement learning behavioral modification while maintaining high reward signals, making the failure undetectable via standard metrics. This has profound implications for AI alignment, safety, and governance as models become more capable. The discovery that a control organism independently develops inoculation-like reasoning is particularly alarming and timely. While Paper 1 is innovative in applying flow matching to neuroscience, Paper 2 addresses a more urgent, broadly impactful problem with immediate relevance to the rapidly advancing frontier of AI development and deployment.

claude-opus-4-6·Jun 11, 2026

#1472of 5669·cs.LG

#1472 of 5669 · cs.LG

Tournament Score

1453±44

10501750

64%

Win Rate

Wins

Losses

Matches

Rating

7/ 10

Significance7.5

Rigor7.5

Novelty8

Clarity8