Xiangsheng Ge, Yang Xie
Neural population activity models can recover rich temporal structure from binned spikes, but their read-in and readout layers often remain tied to a fixed set of recorded neurons. This coupling limits reuse in long-term brain-computer interfaces, where recorded neuron identities, counts, and response statistics can change across days. We introduce GRAFT, a Transformer-based neural population activity model that separates reusable temporal dynamics from a recalibratable neuron interface. The neuron interface controls how recorded neurons enter and leave the shared backbone, and auxiliary gain and positional mechanisms support neural activity modeling inside the Transformer. On MC Maze under the standard NLB'21 protocol, GRAFT reaches 0.3866 co-bps as an ensemble, setting a new state of the art on the primary co-bps metric among public and reported NLB'21 results. In a cross-day protocol constructed from the NLB'21 MC Maze dataset series, GRAFT recalibrates from MC Maze to the scaled MC Maze datasets (Large/Medium/Small) by updating only 9.21% of parameters, reaching 0.3749, 0.3112, and 0.3152 co-bps with restricted target-day support sets. These results show that the same interface-backbone separation supports both strong Transformer-based neural population activity modeling and data-efficient cross-day recalibration.
GRAFT introduces an architectural separation between a reusable Transformer dynamics backbone and a recalibratable neuron interface for neural population activity modeling. The key insight is that temporal dynamics learned from one recording session can be reused across days, while only the neuron-specific read-in/readout pathways need updating when the recorded neural population changes. This is operationalized through: (1) neuron-specific learnable embeddings that parameterize gain-modulated read-in and readout pathways, (2) value-side attention gain modulation within the Transformer, (3) neural positional encoding combining absolute trial-stage position with relative temporal distance bias, and (4) interface-level contrastive consistency regularization during training. The paper solves two problems simultaneously: achieving state-of-the-art within-session neural population modeling (0.3866 co-bps on MC Maze) and enabling cross-day recalibration by updating only ~9.2% of parameters.
The experimental design is generally sound, with clear separation of train/validation/test splits and explicit statements about which data inform which decisions. The authors are commendably transparent about test-set overfitting risks, providing ensemble sensitivity analyses in Table 4 that demonstrate robustness to ensemble selection criteria.
However, several methodological concerns emerge:
Cross-day protocol limitations. The cross-day recalibration protocol is constructed from the NLB'21 MC Maze dataset series, which was originally designed for data-scaling evaluation, not cross-day transfer. While the datasets do come from different recording dates with different neuron counts, they involve the same monkey performing the same task within an 11-day window. This is a relatively benign transfer scenario. The paper acknowledges this limitation but the cross-day claims should be interpreted cautiously.
Baseline fairness. The cross-day comparison is somewhat asymmetric: GRAFT uses a pre-trained backbone from MC Maze (1721 training trials) plus restricted target-day support, while baselines use only target-day training data. The comparison would be more informative if baselines also had access to source-day pre-training, or if a simple fine-tuning baseline (updating all parameters from the source-day model) were included. The claim of exceeding AutoLFADS with restricted support is meaningful but partly reflects the advantage of transfer learning itself rather than the specific interface-backbone separation.
Ablation depth. The ablations are well-structured and informative, showing consistent (if modest) contributions from each component. The cross-day ablation on repeated masking (Rmask) and frozen read-in/readout MLPs provides useful architectural insights. However, the absolute co-bps differences in source-day ablations (0.005–0.008) are small enough that statistical significance across random seeds would strengthen the claims.
BCI recalibration. The most impactful potential application is in long-term brain-computer interfaces, where daily recalibration is a major practical barrier. If the interface-backbone separation generalizes beyond the controlled MC Maze setting, updating only 9.2% of parameters could substantially reduce calibration time and data requirements in clinical deployments.
Neural population modeling. The state-of-the-art co-bps result on MC Maze (albeit an incremental improvement over STNDT ensemble: 0.3866 vs 0.3862) demonstrates that the architectural innovations don't sacrifice modeling quality for recalibration flexibility. This is a useful existence proof.
Broader transferability. The neuron embedding approach—where each neuron is represented by a learnable vector that parameterizes its interface with a shared backbone—could influence how other neural data models handle variable-size populations, though this idea is not entirely new (cf. SPINT).
The paper addresses a genuine and timely bottleneck. The BCI field is increasingly focused on long-term stability, as evidenced by recent work on NoMAD, SPINT, FALCON, and plug-and-play stability approaches. The NLB'21 benchmark remains a relevant evaluation standard. The combination of strong within-session modeling with cross-day transfer in a single architecture addresses a real need, as most prior work optimizes for one or the other.
The paper's positioning relative to SPINT (Le et al. 2025) deserves attention, as both address variable neural populations with embedding-based approaches. A more detailed architectural and empirical comparison would clarify GRAFT's specific advantages. The gain modulation mechanism, while computationally motivated, provides an interesting connection to neuroscience that could be developed further.
The work represents a solid engineering contribution to neural population modeling with a well-motivated architectural design, but its impact is currently bounded by the narrow evaluation setting and marginal improvements on the primary benchmark.
Generated Jun 10, 2026
Paper 1 addresses a critical bottleneck in brain-computer interfaces (chronic recording instability) through a highly innovative architecture that separates temporal dynamics from neural interfaces. Its ability to achieve SOTA performance and efficient cross-day recalibration marks a significant methodological advance. In contrast, Paper 2 presents a more incremental, albeit useful, application of existing tabular foundation models to survival analysis, showing relatively modest performance gains over established baselines.
Paper 1 addresses a critical bottleneck in Brain-Computer Interfaces—long-term instability due to changing neuron recordings across days. By decoupling the neural interface from the temporal backbone and achieving SOTA performance with high data efficiency, it paves the way for practical, long-term BCI applications. While Paper 2 offers a valuable technique for robotic manipulation, Paper 1's solution to a major translational hurdle in neuro-AI presents higher potential for transformative scientific and clinical impact.
ATLAS presents a more broadly impactful framework for automating scientific discovery through active learning, applicable across cognitive science and other domains requiring mechanistic modeling. Its novelty lies in combining active experiment design with interpretable model discovery, achieving significant (5-10x) sample efficiency gains. While GRAFT makes a solid engineering contribution to neural population modeling and BCI recalibration with a new state-of-the-art benchmark result, its impact is more narrowly scoped to the BCI/neural decoding community. ATLAS's potential to transform how scientific inquiry is conducted gives it broader cross-disciplinary significance.
Paper 1 addresses a fundamental theoretical question about the relationship between data symmetries and conservation laws in neural network training dynamics, with broad implications across deep learning theory. It introduces a novel mathematical framework (tensorizable networks) that encompasses multiple architectures. While Paper 2 presents a strong applied contribution to brain-computer interfaces with state-of-the-art results, its impact is more narrowly scoped to neural population modeling. Paper 1's theoretical insights about symmetry, conservation laws, and gradient flow have potential to influence a wider range of fields and future research directions.
Paper 2 addresses a fundamental bottleneck in brain-computer interfaces (BCIs)—cross-day recalibration due to changing neural populations. By decoupling temporal dynamics from the neuron interface, it sets a new state-of-the-art on a standard benchmark while drastically reducing the data needed for recalibration. This has profound implications for long-term biomedical and neuroscientific applications. While Paper 1 offers a valuable industrial application of RL, Paper 2 demonstrates higher foundational scientific impact by advancing the rapidly growing field of neuro-AI and addressing a critical hurdle in viable, long-term neural prosthetics.
Paper 1 resolves longstanding open problems in optimization complexity theory by proving matching lower bounds for higher-order smooth nonconvex optimization, closing fundamental gaps (e.g., the ε^{-7/4} and ε^{-5/3} rates). This has broad theoretical impact across optimization, machine learning, and computational complexity. Paper 2 presents a solid engineering contribution to neural population modeling with incremental improvements on a specific benchmark, but its scope and fundamental impact are narrower. Paper 1's dimension-free construction and complete resolution of open questions give it greater lasting significance.
Paper 2 introduces a fundamental new concept (epistemic calibration) with broad applicability across all of machine learning, supported by theoretical results (impossibility theorem, consistency proofs) and empirical validation. It addresses a universal gap in uncertainty quantification that affects any high-stakes deployment. Paper 1, while achieving strong results on neural population modeling benchmarks and addressing practical BCI recalibration, is more incremental and domain-specific—combining existing ideas (Transformers, adapters, gain modulation) for a narrower neuroscience/BCI application. Paper 2's theoretical contributions have potential to influence calibration research broadly.
Paper 2 (QGF) addresses a broader and more impactful problem: enabling stable RL with expressive generative policies by shifting optimization to test time. This has wide applications across robotics, imitation learning, and RL, touching multiple active research communities. Its insight that test-time guidance can replace unstable actor-critic training is novel and practically significant, especially given favorable scaling properties. Paper 1 (GRAFT), while technically strong with state-of-the-art NLB results, addresses a narrower niche (cross-day BCI recalibration) with more incremental contributions combining existing techniques (Transformers, adapters, gain modulation).
Paper 2 likely has higher scientific impact due to broader cross-domain relevance and timeliness: it proposes a general evaluation framework and metric (ICR) for diffusion models that connects representation learning, generalization, and memorization—issues central across ML, vision, and generative modeling. Its potential applications include training diagnostics and model selection without external evaluators, which could influence many diffusion-based pipelines. Paper 1 is rigorous and impactful within neural population modeling/BCI, but its scope is narrower and depends on a specific benchmark and setting.
Paper 1 likely has higher scientific impact due to broad, timely relevance to privacy in large language models—a central deployment barrier across many sensitive real-world applications. Its benchmarking of empirical privacy leakage under DP adaptation across distribution shift and adaptation methods addresses a widely recognized gap between theoretical guarantees and practical risk, with actionable guidance and a proposed end-to-end assessment framework. Paper 2 is methodologically strong and impactful for neural decoding/BCI, but its domain is narrower and affects fewer fields compared to privacy evaluation frameworks for LLM adaptation.