Tuan A. Vu, Harri Lähdesmäki, Julien Martinelli
Bayesian optimization (BO) is a central tool for sample-efficient design, and latent-space Bayesian optimization (LSBO) extends it to structured objects such as molecules and proteins. In parallel, tabular foundation models such as TabPFN and TabICL now achieve state-of-the-art regression performance and are increasingly used as BO surrogates. Because their Bayesian behavior is induced by large synthetic pretraining collections, the composition of this pretraining distribution is crucial. LSBO creates a distinctive mismatch: the induced map from latent code to objective value differs markedly from the regression tasks used to train current in-context models. We address this mismatch by complementing the pretraining stage of tabular foundation model surrogates with synthetic optimization tasks defined on the latent space of a molecular VAE. The continued-pretraining objective features a regularizer that anchors the model to the original checkpoint, preserving its broad regression prior while avoiding overspecialization to the adaptation tasks. On held-out molecular optimization benchmarks, the resulting model achieves strong performance, supporting the relevance of LSBO-specific adaptation for in-context surrogates.
This paper identifies a specific distribution mismatch problem: tabular foundation models (TFMs) like TabPFN-3 are pretrained on generic synthetic regression tasks, but when deployed as surrogates in latent-space Bayesian optimization (LSBO), they encounter a fundamentally different data distribution — one shaped by VAE latent codes, molecular property landscapes, and value-biased BO histories. The proposed solution, LILBO, performs continued pretraining of TabPFN-3 using synthetically generated molecular LSBO episodes. These episodes are constructed by combining molecular base tasks (logP, QED, similarity scores) through random linear, MLP, or formula-tree combiners, with Boltzmann-sampled contexts that mimic the value-biased nature of BO queries. An L2-SP regularizer anchors the adapted model to the original checkpoint to prevent catastrophic forgetting.
The contribution is conceptually clean: bridge the gap between generic tabular pretraining and domain-specific LSBO deployment through lightweight continued pretraining. This is a reasonable and well-motivated idea.
The paper operates at the intersection of two active research areas — tabular foundation models and latent-space Bayesian optimization — making it timely. The idea that TFM surrogates should be adapted to their deployment domain is broadly applicable beyond molecular design (proteins, materials, etc.).
However, the practical impact is limited by several factors:
The more impactful finding may actually be the secondary one: that TabPFN-3, as a plug-in surrogate without any adaptation, is already competitive with specialized LSBO methods. This observation alone could influence how practitioners approach LSBO.
The paper is well-timed. TFMs are rapidly being adopted as BO surrogates (PFNs4BO, GIT-BO), and LSBO remains a workhorse for molecular optimization. The question of whether generic pretraining distributions are sufficient for specialized BO domains is genuinely important and underexplored. The paper also connects to the broader trend of continued pretraining / domain adaptation for foundation models.
The paper would benefit substantially from: (1) a direct comparison of surrogate prediction quality (not just optimization performance), (2) ablations isolating the effect of each design choice, (3) experiments on at least one non-molecular domain to demonstrate generality, and (4) analysis of how performance degrades as VAE retraining introduces latent drift.
The framing as a workshop paper is appropriate given the current level of evidence. The idea is sound and worth investigating further, but the empirical case is not yet compelling enough for a top venue.
Generated Jun 9, 2026
Paper 2 has higher impact potential: it targets Bayesian optimization for molecules/proteins, a high-value application area, and addresses a timely, broadly relevant problem—how to adapt tabular foundation models/ICL surrogates when task priors mismatch (LSBO). The proposed continued-pretraining with latent-space synthetic optimization tasks plus anchoring regularizer is a transferable recipe for aligning foundation-model priors to downstream decision-making tasks, likely influencing BO, foundation models, and scientific discovery workflows. Paper 1 is useful for HDLSS tabular synthesis (e.g., omics) but is more niche and incremental in combining block structure with existing deep generative priors.
Paper 2 has higher estimated impact due to stronger timeliness and broader applicability: it connects foundation-model in-context learning with Bayesian optimization for molecule/protein design, a highly active area with clear real-world relevance. The proposed LSBO-specific continued pretraining and anchoring regularizer is a pragmatic, general recipe that could transfer to many latent-design settings and spur follow-on work across ML, optimization, and computational chemistry. Paper 1 is technically solid and novel for Fokker–Planck operator learning, but its impact is more specialized to stochastic PDE/SDE communities and narrower in immediate application scope.
Paper 2 demonstrates higher potential scientific impact due to its direct applicability to critical real-world problems like molecular design and protein engineering. By bridging the highly timely fields of in-context learning foundation models and latent-space Bayesian optimization, it offers a practical tool for scientific discovery across chemistry and biology. While Paper 1 provides rigorous theoretical insights into neural network learning dynamics, Paper 2's methodological innovation solves a practical distribution mismatch problem, paving the way for immediate, broad impact in applied sciences and AI-driven drug discovery.
Paper 2 addresses a more impactful problem at the intersection of foundation models and Bayesian optimization for molecular/protein design, which is a rapidly growing field with significant real-world applications in drug discovery and materials science. It introduces a novel continued-pretraining strategy to address distribution mismatch in latent-space BO, combining multiple trending research areas (foundation models, in-context learning, molecular optimization). Paper 1 provides useful but incremental technical guidance on parameter conversion between privacy frameworks, with narrower scope and more limited applicability.
Paper 1 offers higher potential scientific impact due to its methodological innovation and broader applicability. By adapting tabular foundation models for latent-space Bayesian optimization, it addresses a fundamental distribution shift problem, advancing the frontiers of in-context learning for sample-efficient design. This approach can be applied across diverse domains like drug discovery and materials science. In contrast, Paper 2 provides a highly optimized, domain-specific engineering pipeline for chemical retrosynthesis. While valuable, it largely relies on combining established techniques (Transformers, LambdaMART), offering narrower theoretical and cross-disciplinary impact.
Paper 1 introduces a fundamentally novel approach to disentanglement using holographic reduced representations, combining symbolic AI concepts with neural networks in an innovative way. It provides both empirical results and rigorous information-theoretic analysis, including capacity bounds. The work bridges multiple fields (representation learning, cognitive science, information theory) and addresses a long-standing challenge. Paper 2 makes a more incremental contribution—adapting pretraining distributions for latent-space BO surrogates—which, while useful, is narrower in scope and more application-specific with less foundational theoretical insight.
Paper 2 addresses a timely intersection of foundation models, Bayesian optimization, and molecular design—areas of high current interest. Its novel approach of adapting tabular foundation model surrogates for latent-space BO through continued pretraining with domain-specific synthetic tasks is innovative and has direct applications in drug discovery and materials science. Paper 1, while useful as a benchmark contribution for symbolic regression, is more incremental—it improves evaluation methodology rather than introducing new algorithmic capabilities. Benchmarks have impact but typically less than methodological advances with clear real-world applications in high-impact domains like molecular optimization.
Paper 2 addresses a novel intersection of two rapidly growing fields—foundation models and Bayesian optimization—with broad applications in molecular and protein design. Its contribution of adapting tabular foundation model surrogates for latent-space BO via continued pretraining is timely and innovative, with clear real-world applications in drug discovery and materials science. Paper 1, while solid, offers incremental improvements to prototype rehearsal in exemplar-free continual learning, a narrower subfield. Paper 2's cross-disciplinary relevance (ML, chemistry, biology) and connection to the foundation model paradigm give it higher potential impact.
Paper 1 introduces a novel geometric framework for understanding transformer representations that is model-agnostic, probe-free, and applicable across architectures. Its findings (attractor dynamics, curvature encoding complexity, trajectory bifurcation, universal three-phase structure) are broadly relevant to the entire mechanistic interpretability community and potentially to all transformer-based AI systems. Paper 2 makes a solid but narrower contribution—adapting tabular foundation models for latent-space Bayesian optimization in molecular design. While useful, it addresses a more specialized problem with incremental methodological innovation (continued pretraining with regularization). Paper 1's breadth of impact across interpretability, neuroscience-inspired AI analysis, and its open-source toolkit give it higher potential impact.
Paper 2 has higher estimated impact due to broader applicability and timeliness: robust drift detection/diagnosis is central to deploying continual learning systems in real-world streams (robotics, surveillance, autonomous driving, data-centric MLOps). Leveraging frozen large pretrained models for decoupled, zero-shot monitoring is a novel, generally reusable system concept that can plug into many TFCL methods and modalities, likely influencing multiple subfields (continual learning, distribution shift, foundation-model tooling). Paper 1 is solid and innovative but more specialized to LSBO/tabular ICL surrogates and molecular latent spaces, limiting breadth.