Noémi Éltető, Nathaniel D. Daw, Kimberly L. Stachenfeld, Kevin J. Miller
Advancing scientific understanding through mechanistic modeling requires posing the right experimental questions to yield maximally informative data. To automate this pursuit within cognitive science, we introduce ATLAS (Active Theory Learning for Automated Science), an active learning framework for the data-driven discovery of interpretable behavioral models. ATLAS iterates between generating mechanistic hypotheses--instantiated as a diverse ensemble of sparse neural networks (Disentangled RNNs)--and designing experiments that optimally distinguish between them. We test this approach on the problem of recovering reinforcement learning agents from their behavior in bandit tasks. ATLAS designs varied sequences of qualitatively novel experiments with temporal structure tailored to underlying agent characteristics. The models trained on these experiments are evaluated against a comprehensive set of metrics for mechanistic modeling that capture behavioral, structural, and computational similarity. ATLAS achieves a 5-10x improvement in sample efficiency across all metrics compared to random experimentation, and its performance is further validated against expert-designed experiments derived from literature. These in silico results showcase ATLAS's potential to accelerate human-interpretable insights in cognitive science and other domains where scientific inquiry relies on discovering mechanistic models.
ATLAS addresses a genuine gap at the intersection of active learning and interpretable mechanistic modeling. The key insight is that existing optimal experiment design methods either optimize black-box models (losing interpretability) or discriminate among pre-specified mechanistic models (risking misspecification). ATLAS bridges this by using ensembles of Disentangled RNNs (DisRNNs) — sparse, interpretable neural networks — as the hypothesis space, and designing experiments that maximally distinguish among these structurally diverse hypotheses.
The framework iterates between three components: (1) a Hypothesis Generator that trains diverse DisRNN ensembles with varying sparsity penalties, (2) an Experiment Optimizer that uses disagreement-based expected information gain to design binary reward matrices, and (3) an Experiment Runner that collects new data. The contribution is primarily architectural/systemic rather than introducing fundamentally new algorithmic components — it synthesizes Query-by-Committee active learning, DisRNNs, and evolutionary experiment optimization into a coherent pipeline.
Strengths in evaluation design: The paper evaluates models across three complementary metrics — behavioral similarity (predictive likelihood on held-out experiments), structural similarity (computational graph isomorphism), and dynamical similarity (approximate bisimulation). This multi-faceted evaluation is notably more thorough than typical model recovery studies and provides convincing evidence that ATLAS recovers not just predictive accuracy but true mechanistic structure.
Statistical reporting: The paper uses appropriate statistical tests (Welch's t-tests, Barnard's exact test, Wilson score intervals) and reports results across 8 independent runs, which is reasonable though not extensive.
Concerns: The evaluation is entirely *in silico* — ground-truth agents are known, which makes the recovery problem well-defined but limits claims about real-world applicability. The two test agents (Q-learning and Leaky Actor-Critic) are relatively simple, with 2-3 latent variables and sparse computational graphs. It remains unclear how ATLAS would scale to agents with more complex internal structure, higher-dimensional action spaces, or partial observability. The experiment space is also constrained to binary T×A matrices, which is simpler than many real experimental design problems.
The experiment optimizer uses hill climbing with 128 random restarts — a straightforward approach. While the convergence analysis (Appendix A.1) shows robustness for the simple two-agent case, scalability to larger ensembles and design spaces is not demonstrated.
Within cognitive science: ATLAS has clear potential to accelerate experimental design for studying reward learning in humans and animals. The 5-10× sample efficiency improvement over random experimentation is substantial, and the comparison against expert-designed experiments (matching or exceeding them) is particularly compelling. If validated with real subjects, this could meaningfully reduce the cost and time of behavioral experiments.
Broader implications: The framework is conceptually general — any domain where interpretable mechanistic models can be expressed as sparse neural networks could potentially benefit. However, the current implementation is tightly coupled to the bandit task structure and DisRNN methodology, limiting immediate transferability.
An intriguing finding is that ATLAS-designed experiments exhibit rich temporal structure (partially overlapping blocks, anomalous trials) that differs qualitatively from human-designed experiments. This suggests ATLAS may discover experimental paradigms that researchers would not naturally consider, potentially opening new avenues for investigation.
The paper sits at a productive intersection of several active research directions: automated scientific discovery, interpretable AI, and optimal experiment design. The growing interest in AI-driven science (as evidenced by recent Nature and PNAS publications cited in the paper) makes this timely. The specific focus on cognitive science, where experiments with human/animal subjects are genuinely expensive and time-limited, provides a compelling motivation.
The reliance on DisRNNs, which have gained traction in cognitive modeling (references 18-21 show growing adoption), makes this a natural extension of an emerging methodology. However, the framework's dependence on this specific model class could limit adoption if alternative interpretable modeling approaches prove more effective in some domains.
Reproducibility: The paper provides substantial methodological detail in appendices, including hyperparameter settings and compute requirements, supporting reproducibility. However, code availability is not mentioned.
ATLAS represents a well-executed synthesis of existing techniques (active learning, interpretable neural networks, evolutionary optimization) into a coherent framework for automated experimental design. Its primary contribution is demonstrating that this combination works — achieving meaningful sample efficiency gains with interpretable outputs. The evaluation methodology, particularly the three-tiered metric system, is a genuine contribution. However, the in-silico-only validation on simple agents limits the strength of claims about real-world scientific impact. The work opens a clear path for future validation with real experimental systems, which would substantially increase its impact.
Generated Jun 11, 2026
Paper 1 introduces a novel theoretical framework providing certified, computable predictability horizons for equivariant world models, with rigorous mathematical guarantees (two-sided bounds tied to Lyapunov spectra). It demonstrates practical applicability to deployed systems (TD-MPC2, V-JEPA 2) and establishes fundamental limits showing scale alone cannot achieve calibrated horizons. This has broad impact across AI safety, robotics, and dynamical systems. Paper 2, while valuable for cognitive science automation, addresses a narrower domain with incremental advances in active learning methodology and is validated only in silico on relatively simple RL agents.
Paper 2 (ATLAS) likely has higher impact due to broader applicability and clearer real-world relevance: automating experiment design for mechanistic theory discovery generalizes beyond cognitive science to robotics, biology, and materials science. It proposes an end-to-end active learning loop (hypothesis generation + optimal experiment selection) with strong empirical gains (5–10× sample efficiency) and evaluation against expert-designed baselines, suggesting methodological rigor and practical utility. Paper 1 offers a novel geometric lens and useful diagnostics for diffusion dynamics, but its primary reach is narrower (generative modeling behavior/interpretability) and more diagnostic than enabling new scientific workflows.
Paper 2 (ATLAS) has higher potential impact: it introduces a general active theory-learning framework that could automate experiment design and mechanistic model discovery, with broad applicability beyond the specific bandit/RL setting (cognitive science, neuroscience, behavioral economics, and other mechanistic sciences). The approach is methodologically substantive (closed-loop hypothesis generation + optimal experiment selection + multi-metric mechanistic evaluation) and timely for automated science. Paper 1 is rigorous and valuable for LLM evaluation/safety, but its impact is more scoped to benchmarking epistemic susceptibility in citation-augmented LLMs.
ATLAS introduces a fundamentally novel framework for automating scientific discovery through active learning of mechanistic models, with broad applicability across cognitive science and other scientific domains. Its 5-10x sample efficiency improvement and comprehensive evaluation methodology represent a significant methodological contribution. While APPO offers useful incremental improvements to agentic RL with fine-grained credit assignment, it is more narrowly focused on optimizing LLM agent performance (~4 point gains). ATLAS has greater potential for cross-disciplinary impact by addressing the fundamental challenge of automated experimental design and scientific model discovery.
Paper 1 is more novel and potentially higher-impact: it frames automated science as active theory learning that jointly generates mechanistic hypotheses and designs discriminative experiments, with strong sample-efficiency gains and evaluation against mechanistic similarity metrics. This has broad applicability beyond the specific bandit setting (e.g., neuroscience, psychology, experimental biology) and is timely for interpretable, automated discovery. Paper 2 is a solid incremental architecture advance (multi-rate MoE + attention for LNNs) with clear applications, but it combines established components and appears narrower in conceptual impact.
Paper 1 (ATLAS) focuses on the automation of scientific discovery, a highly timely area with profound cross-disciplinary potential. By automating hypothesis generation and experimental design, it fundamentally impacts how scientific inquiry is conducted, offering broad applications across cognitive science and beyond. While Paper 2 presents rigorous, valuable advancements in biologically plausible deep learning and optimization, its scope is largely confined to neural network training dynamics. ATLAS's broader goal of accelerating human-interpretable mechanistic modeling gives it a higher potential for widespread, paradigm-shifting scientific impact.
ATLAS presents a more novel and broadly impactful framework combining active learning with automated scientific discovery, addressing a fundamental challenge across multiple scientific domains. Its integration of mechanistic modeling, experiment design, and interpretability offers a paradigm shift for cognitive science and beyond. While TaskFusion addresses a practical niche in continual anomaly detection for heterogeneous tabular data, ATLAS's interdisciplinary reach (AI + cognitive science + philosophy of science), methodological innovation (combining disentangled RNNs with optimal experiment design), and potential to accelerate scientific inquiry give it substantially higher impact potential.
Paper 1 proposes a highly innovative framework for automated science that automates hypothesis generation and experimental design. Its potential to accelerate mechanistic modeling across various scientific domains gives it a much larger breadth of impact and transformative potential compared to Paper 2, which focuses on a narrower methodological improvement for training efficiency in ECG classification.
ATLAS introduces a novel framework for automated scientific discovery that combines active learning with mechanistic modeling, with broad applicability across cognitive science and other scientific domains. Its potential to fundamentally change how experiments are designed and theories are discovered represents a paradigm-shifting contribution. While Paper 1 (GASLoC) makes a solid engineering contribution to distributed LLM training with practical benefits in heterogeneous settings, it is more incremental—combining existing ideas (gossip protocols, local updates, outer optimizers) in a useful but narrower way. ATLAS's cross-disciplinary impact and novelty in automating the scientific method give it higher long-term impact potential.
ATLAS presents a more broadly impactful contribution: an active learning framework for automated scientific discovery that generalizes across domains. It addresses a fundamental challenge in science—optimal experimental design for mechanistic model discovery—with demonstrated 5-10x sample efficiency gains. Its potential applications span cognitive science, neuroscience, and other experimental sciences. While RePAIR is a solid contribution combining existing self-supervised learning paradigms (MAE, JEPA, BERT) applied to chess, its domain is narrower and the novelty is more incremental, combining known techniques in a specific application rather than enabling a new scientific methodology.