Martin Jankowiak, Yerdos Ordabayev, Rudraksh Tuwani, Henry N. Ward, Hunter Nisonoff, James M. McFarland, Gevorg Grigoryan
Despite its importance to applications in protein design, predicting protein properties like binding affinity and thermostability from sparse experimental data remains a significant challenge. Accordingly, we introduce a class of sequence kernels that exploit evolutionary substitution matrices as well as local linearity and demonstrate that the resulting Gaussian processes provide data-efficient models of protein property landscapes, frequently outperforming alternatives that rely on foundation model embeddings. Furthermore--by learning what are in effect structure-aware substitution matrices--we show that our kernels can readily incorporate structural information from foundation models. We demonstrate that these structure-conditioned kernels are well suited to multi-task learning across multiple protein property landscapes and can decisively outperform local supervised learning methods.
This paper introduces LOCK (Locally Linear Correlation Kernels) and CLOCK (structure-Conditioned LOCK), a family of Gaussian process kernels designed for protein property prediction. The key innovations are:
The paper is exceptionally thorough in its experimental evaluation:
One concern is that the concentrated likelihood used for CLOCK training (profiling out kernel scale) makes an implicit assumption about homogeneity across landscapes that, while empirically validated here, may not generalize to all settings.
Immediate applications: The LOCK-GP is directly useful for protein engineering campaigns where experimental data is scarce (the most common scenario). Its data efficiency—outperforming foundation model-based methods with 48-192 training points—is practically significant. The uncertainty estimates enable Thompson sampling for Bayesian optimization in directed evolution.
Multi-task learning: CLOCK-GP's ability to learn transferable structure-conditioned substitution matrices from as few as 10 landscapes, then deploy them zero-shot on new landscapes, addresses a real bottleneck in protein engineering where related landscapes are available but individual datasets are small.
Broader methodological influence: The paper demonstrates that carefully constructed, domain-informed kernels can outperform foundation model embeddings in low-data regimes—a finding with implications beyond proteins. The infinite divisibility observation could inspire similar kernel constructions in other biological sequence domains (DNA, RNA).
Computational efficiency: LOCK-GP is 2-140× faster at inference than foundation model-based alternatives (Table 10), making it attractive for iterative design loops.
This work arrives at an important moment. The field is heavily invested in ever-larger foundation models (ESM-2, SaProt, etc.), and there is growing recognition that these models don't always translate to downstream supervised performance (Li et al. 2024, Vieira et al. 2025). LOCK-GP provides a principled, lightweight alternative that challenges the assumption that more parameters and pretraining always wins. The fact that 210 BLOSUM parameters frequently outperform 650M ESM-2 parameters is a striking and timely result.
The multi-task CLOCK framework also addresses the emerging need for transferable protein property models as high-throughput experimental data (e.g., mega-scale DMS) becomes more available.
The paper's analysis of Kermut kernel pathologies (Sec. A.13)—combinatorial explosion, wild-type degeneracy, non-intuitive distance behavior—is a valuable contribution to the community's understanding of existing methods. The interpretability of CLOCK correlations (Fig. 6, 9) showing known structural biology patterns (proline at helix caps, arginine on surfaces) provides satisfying validation.
The proof-of-concept LoRA fine-tuning of CLOCK across property types (thermostability → capsid viability) hints at exciting future directions for transfer learning with minimal parameters.
Generated Jun 10, 2026
nD-RoPE addresses a fundamental limitation in position embedding for Transformers—one of the most widely used architectures in AI. Its unified theoretical framework for extending RoPE to arbitrary dimensions has broad applicability across vision, video, 3D point clouds, and potentially other modalities. The mathematical rigor (Hilbert space formulation, spectral isotropy conditions) combined with demonstrated empirical gains across multiple domains suggests high adoption potential. Paper 2 makes a solid contribution to protein property prediction with novel kernels, but targets a narrower application domain with a smaller potential user base.
Paper 2 addresses a critical and timely bottleneck in RL training for LLMs—a topic of immense current interest. It provides both theoretical insights (entropy bounds on MTP acceptance) and practical solutions (novel TV loss, rejection sampling recipes) with substantial empirical validation across multiple model scales and tasks, achieving 1.8x speedup. Its breadth of impact across the LLM training ecosystem is significant. Paper 1, while solid and useful for protein engineering, operates in a narrower domain with incremental kernel-based improvements over existing methods, and its impact is more specialized to computational biology.
Paper 2 proposes a fundamental, unifying theoretical framework for interpretable machine learning, a critical bottleneck in modern AI. By grounding interpretability in Lagrangian mechanics, it offers a highly novel, deductive approach that spans traditional, concept-based, and mechanistic interpretability. While Paper 1 provides a valuable, data-efficient methodological advance for computational biology, Paper 2 has a significantly broader potential impact, as it addresses a fragmented landscape in AI theory and could influence model design, safety, and evaluation across nearly all AI application domains.
Paper 2 addresses a fundamental challenge in protein engineering with a novel kernel method that combines evolutionary substitution matrices with structural information. It offers broad applicability across protein design, drug discovery, and biotechnology. The method's data efficiency and ability to outperform foundation model embeddings is particularly impactful. Paper 1, while addressing an important AI safety problem, is more niche—focused on a specific monitoring protocol within AI control—and its impact depends on the adoption of particular AI deployment frameworks. Paper 2's contributions are more broadly applicable across multiple scientific disciplines.
Paper 2 addresses a fundamental challenge in protein engineering—predicting properties from sparse data—with a novel kernel approach that outperforms foundation model embeddings. Its impact spans computational biology, drug design, and protein engineering, offering practical data-efficient methods for real-world protein design. Paper 1, while technically sound, represents an incremental improvement in speech-LLM integration within a crowded field. Paper 2's novelty in combining evolutionary substitution matrices with structural information and its broader cross-disciplinary applicability give it higher long-term scientific impact.
Paper 2 introduces novel sequence kernels for protein property prediction that combine evolutionary substitution matrices with Gaussian processes, addressing a fundamental challenge in protein design. It outperforms foundation model embeddings in data-efficient settings and introduces structure-conditioned kernels for multi-task learning—both methodologically innovative contributions with broad applicability across computational biology, drug design, and protein engineering. Paper 1, while rigorous, addresses a narrow operational problem at a single airport with incremental ML benchmarking and limited generalizability beyond aviation operations.
Paper 2 introduces a novel and practical methodology for protein property prediction that addresses a significant challenge in protein design. Its contributions—evolutionary substitution matrix-based kernels, structure-aware learning, and multi-task capabilities—have broad applicability in biotechnology, drug design, and protein engineering. Paper 1, while methodologically rigorous and offering important cautionary insights about interpretability practices in MoE models, is primarily a negative result with narrower scope, showing that observational metrics don't predict causal importance. Paper 2's positive, constructive contribution with demonstrated practical utility gives it higher potential for real-world impact and broader adoption.
Paper 2 addresses a practical, high-impact problem in protein engineering with a novel kernel design that incorporates evolutionary and structural information. It demonstrates clear empirical advantages over foundation model embeddings for data-efficient protein property prediction, which has immediate applications in drug design and protein engineering. Paper 1, while theoretically rigorous in unifying GP-UCB and DEC frameworks for kernel bandits, addresses a narrower theoretical audience. Paper 2's breadth of impact across computational biology, machine learning, and biotechnology, combined with its timeliness given the rise of protein foundation models, gives it higher potential scientific impact.
Paper 1 has higher potential impact: it challenges a widely assumed constant in neural scaling laws by showing optimizer-dependent exponents, which could materially change compute/performance forecasting and optimizer selection across modern ML. The work is timely and broadly relevant (theory + practical implications for LLM training), and introduces a diagnostic linking spectral conditions to optimizer gains. Paper 2 is valuable for protein engineering and data-efficient GP modeling, but its impact is more domain-specific and incremental relative to existing kernel/GP advances and foundation-model-based predictors.
Paper 2 addresses a fundamental challenge in reinforcement learning—how to perform policy improvement with expressive generative models without destabilizing training. The proposed test-time-only optimization paradigm (QGF) is a novel conceptual shift that sidesteps actor-critic instability, offering favorable scaling properties. This has broad implications across robotics, imitation learning, and RL at scale. Paper 1 makes solid contributions to protein property prediction with clever kernel design, but operates in a narrower niche. Paper 2's potential to influence how RL is done with diffusion/flow policies gives it broader and more timely impact.