Flexible Kernels for Protein Property Prediction

Martin Jankowiak, Yerdos Ordabayev, Rudraksh Tuwani, Henry N. Ward, Hunter Nisonoff, James M. McFarland, Gevorg Grigoryan

Jun 9, 2026arXiv:2606.11057v1

cs.LGq-bio.BMstat.ML

#976of 5669·cs.LG

#976 of 5669 · cs.LG

Tournament Score

1474±44

10501750

60%

Win Rate

Wins

Losses

Matches

Rating

8/ 10

Significance8

Rigor8.5

Novelty7.5

Clarity8.5

Abstract

Despite its importance to applications in protein design, predicting protein properties like binding affinity and thermostability from sparse experimental data remains a significant challenge. Accordingly, we introduce a class of sequence kernels that exploit evolutionary substitution matrices as well as local linearity and demonstrate that the resulting Gaussian processes provide data-efficient models of protein property landscapes, frequently outperforming alternatives that rely on foundation model embeddings. Furthermore--by learning what are in effect structure-aware substitution matrices--we show that our kernels can readily incorporate structural information from foundation models. We demonstrate that these structure-conditioned kernels are well suited to multi-task learning across multiple protein property landscapes and can decisively outperform local supervised learning methods.

AI Impact Assessments

(1 models)

Scientific Impact Assessment: "Flexible Kernels for Protein Property Prediction"

1. Core Contribution

This paper introduces LOCK (Locally Linear Correlation Kernels) and CLOCK (structure-Conditioned LOCK), a family of Gaussian process kernels designed for protein property prediction. The key innovations are:

Exploitation of infinitely divisible substitution matrices: The authors make the mathematically grounded observation that BLOSUM substitution matrices are infinitely divisible (Hadamard powers preserve PSD-ness), enabling learnable exponents as kernel hyperparameters. This is an elegant way to parameterize amino acid similarity with a single pre-existing 20×20 matrix rather than millions of foundation model parameters.

Local linearity construction: The product kernel k_nl × k_lin creates locally linear function spaces—linear near training data but mean-reverting far away—which is biophysically motivated by the approximate additivity of protein mutations.

CLOCK extension: Structure embeddings from foundation models are mapped to position-specific correlation matrices via a learned linear projection W, creating zero-shot structure-conditioned kernels with ~49k parameters that enable multi-task learning across landscapes.

2. Methodological Rigor

The paper is exceptionally thorough in its experimental evaluation:

Comprehensive benchmarking: 30+ models across 21 datasets in three evaluation regimes (cross-validation, extrapolation, unseen mutations), with careful train/test split construction.

Systematic ablations: The ablation study (Sec. 5.2, Tables 4-6) convincingly isolates the contribution of each component—BLOSUM incorporation, local exponents, local linearity, and hyperparameter priors.

Theoretical grounding: The infinite divisibility characterization (Appendix A.1) is mathematically rigorous, connecting BLOSUM matrices to well-studied results from harmonic analysis (Schoenberg's theorem, Berg et al. 1984). The local linearity derivation is clean and illuminating.

Uncertainty evaluation: CRPS and NLL metrics are reported alongside standard correlation metrics, and LOCK-GP consistently shows superior calibration.

One concern is that the concentrated likelihood used for CLOCK training (profiling out kernel scale) makes an implicit assumption about homogeneity across landscapes that, while empirically validated here, may not generalize to all settings.

3. Potential Impact

Immediate applications: The LOCK-GP is directly useful for protein engineering campaigns where experimental data is scarce (the most common scenario). Its data efficiency—outperforming foundation model-based methods with 48-192 training points—is practically significant. The uncertainty estimates enable Thompson sampling for Bayesian optimization in directed evolution.

Multi-task learning: CLOCK-GP's ability to learn transferable structure-conditioned substitution matrices from as few as 10 landscapes, then deploy them zero-shot on new landscapes, addresses a real bottleneck in protein engineering where related landscapes are available but individual datasets are small.

Broader methodological influence: The paper demonstrates that carefully constructed, domain-informed kernels can outperform foundation model embeddings in low-data regimes—a finding with implications beyond proteins. The infinite divisibility observation could inspire similar kernel constructions in other biological sequence domains (DNA, RNA).

Computational efficiency: LOCK-GP is 2-140× faster at inference than foundation model-based alternatives (Table 10), making it attractive for iterative design loops.

4. Timeliness & Relevance

This work arrives at an important moment. The field is heavily invested in ever-larger foundation models (ESM-2, SaProt, etc.), and there is growing recognition that these models don't always translate to downstream supervised performance (Li et al. 2024, Vieira et al. 2025). LOCK-GP provides a principled, lightweight alternative that challenges the assumption that more parameters and pretraining always wins. The fact that 210 BLOSUM parameters frequently outperform 650M ESM-2 parameters is a striking and timely result.

The multi-task CLOCK framework also addresses the emerging need for transferable protein property models as high-throughput experimental data (e.g., mega-scale DMS) becomes more available.

5. Strengths & Limitations

Key Strengths:

Remarkable parameter efficiency (210 parameters vs. millions) with competitive or superior performance

Principled mathematical foundation connecting classical bioinformatics (BLOSUM) to modern GP methodology

Best-in-class uncertainty quantification (CRPS, NLL) across all regimes

Clean, modular kernel design that naturally accommodates structural information

Very thorough experimental evaluation with 21 datasets, multiple regimes, and extensive ablations

Open-source implementation

Notable Limitations:

Fixed-length/aligned sequences only: The kernel requires aligned sequences of fixed length, excluding many important applications (e.g., insertions/deletions beyond gap tokens, comparing proteins of different families).

Epistasis handling: As acknowledged (Sec. A.4), LOCK captures smooth context-dependent epistasis but may struggle with strong specific epistatic interactions (demonstrated on GB1, Table 3).

Cubic scaling: Standard GP inference scales as O(N³), limiting applicability to large datasets, though the authors discuss mitigations (Sec. A.11).

CLOCK requires multiple landscapes: The structure-conditioned kernel needs training landscapes to learn W; the fine-tuning experiment (Sec. A.9) is proof-of-concept only.

Evaluation scope: Most datasets are DMS-style with mutations concentrated near a reference sequence; performance on more divergent sequence sets is less clear.

6. Additional Observations

The paper's analysis of Kermut kernel pathologies (Sec. A.13)—combinatorial explosion, wild-type degeneracy, non-intuitive distance behavior—is a valuable contribution to the community's understanding of existing methods. The interpretability of CLOCK correlations (Fig. 6, 9) showing known structural biology patterns (proline at helix caps, arginine on surfaces) provides satisfying validation.

The proof-of-concept LoRA fine-tuning of CLOCK across property types (thermostability → capsid viability) hints at exciting future directions for transfer learning with minimal parameters.

Rating:8/ 10

Significance 8Rigor 8.5Novelty 7.5Clarity 8.5

Generated Jun 10, 2026

Comparison History (20)

Lostvs. nD-RoPE: A Generalized RoPE for n-Dimensional Position Embedding

nD-RoPE addresses a fundamental limitation in position embedding for Transformers—one of the most widely used architectures in AI. Its unified theoretical framework for extending RoPE to arbitrary dimensions has broad applicability across vision, video, 3D point clouds, and potentially other modalities. The mathematical rigor (Hilbert space formulation, spectral isotropy conditions) combined with demonstrated empirical gains across multiple domains suggests high adoption potential. Paper 2 makes a solid contribution to protein property prediction with novel kernels, but targets a narrower application domain with a smaller potential user base.

claude-opus-4-6·Jun 11, 2026

Lostvs. Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling

Paper 2 addresses a critical and timely bottleneck in RL training for LLMs—a topic of immense current interest. It provides both theoretical insights (entropy bounds on MTP acceptance) and practical solutions (novel TV loss, rejection sampling recipes) with substantial empirical validation across multiple model scales and tasks, achieving 1.8x speedup. Its breadth of impact across the LLM training ecosystem is significant. Paper 1, while solid and useful for protein engineering, operates in a narrower domain with incremental kernel-based improvements over existing methods, and its impact is more specialized to computational biology.

claude-opus-4-6·Jun 11, 2026

Lostvs. The Standard Interpretable Model: A general theory of interpretable machine learning to deductively design interpretable methods using Lagrangian mechanics

Paper 2 proposes a fundamental, unifying theoretical framework for interpretable machine learning, a critical bottleneck in modern AI. By grounding interpretability in Lagrangian mechanics, it offers a highly novel, deductive approach that spans traditional, concept-based, and mechanistic interpretability. While Paper 1 provides a valuable, data-efficient methodological advance for computational biology, Paper 2 has a significantly broader potential impact, as it addresses a fragmented landscape in AI theory and could influence model design, safety, and evaluation across nearly all AI application domains.

gemini-3.1-pro-preview·Jun 11, 2026

Wonvs. Bootstrapped Monitoring: Leveraging Transparent Reasoning to Oversee Stronger AI Agents

Paper 2 addresses a fundamental challenge in protein engineering with a novel kernel method that combines evolutionary substitution matrices with structural information. It offers broad applicability across protein design, drug discovery, and biotechnology. The method's data efficiency and ability to outperform foundation model embeddings is particularly impactful. Paper 1, while addressing an important AI safety problem, is more niche—focused on a specific monitoring protocol within AI control—and its impact depends on the adoption of particular AI deployment frameworks. Paper 2's contributions are more broadly applicable across multiple scientific disciplines.

claude-opus-4-6·Jun 11, 2026

Wonvs. AuRA: Internalizing Audio Understanding into LLMs as LoRA

Paper 2 addresses a fundamental challenge in protein engineering—predicting properties from sparse data—with a novel kernel approach that outperforms foundation model embeddings. Its impact spans computational biology, drug design, and protein engineering, offering practical data-efficient methods for real-world protein design. Paper 1, while technically sound, represents an incremental improvement in speech-LLM integration within a crowded field. Paper 2's novelty in combining evolutionary substitution matrices with structural information and its broader cross-disciplinary applicability give it higher long-term scientific impact.

claude-opus-4-6·Jun 10, 2026

Wonvs. Data-Driven Runway and Taxiway Exits Prediction of Landing Aircraft: A Case Study at Hartsfield-Jackson Atlanta International Airport

Paper 2 introduces novel sequence kernels for protein property prediction that combine evolutionary substitution matrices with Gaussian processes, addressing a fundamental challenge in protein design. It outperforms foundation model embeddings in data-efficient settings and introduces structure-conditioned kernels for multi-task learning—both methodologically innovative contributions with broad applicability across computational biology, drug design, and protein engineering. Paper 1, while rigorous, addresses a narrow operational problem at a single airport with incremental ML benchmarking and limited generalizability beyond aviation operations.

claude-opus-4-6·Jun 10, 2026

Wonvs. From Observation to Intervention: A Causal Audit of Expert Importance in Mixture-of-Experts Models

Paper 2 introduces a novel and practical methodology for protein property prediction that addresses a significant challenge in protein design. Its contributions—evolutionary substitution matrix-based kernels, structure-aware learning, and multi-task capabilities—have broad applicability in biotechnology, drug design, and protein engineering. Paper 1, while methodologically rigorous and offering important cautionary insights about interpretability practices in MoE models, is primarily a negative result with narrower scope, showing that observational metrics don't predict causal importance. Paper 2's positive, constructive contribution with demonstrated practical utility gives it higher potential for real-world impact and broader adoption.

claude-opus-4-6·Jun 10, 2026

Wonvs. Algorithmic and Minimax Complexities in Kernel Bandits

Paper 2 addresses a practical, high-impact problem in protein engineering with a novel kernel design that incorporates evolutionary and structural information. It demonstrates clear empirical advantages over foundation model embeddings for data-efficient protein property prediction, which has immediate applications in drug design and protein engineering. Paper 1, while theoretically rigorous in unifying GP-UCB and DEC frameworks for kernel bandits, addresses a narrower theoretical audience. Paper 2's breadth of impact across computational biology, machine learning, and biotechnology, combined with its timeliness given the rise of protein foundation models, gives it higher potential scientific impact.

claude-opus-4-6·Jun 10, 2026

Lostvs. On the Optimizer Dependence of Neural Scaling Laws

Paper 1 has higher potential impact: it challenges a widely assumed constant in neural scaling laws by showing optimizer-dependent exponents, which could materially change compute/performance forecasting and optimizer selection across modern ML. The work is timely and broadly relevant (theory + practical implications for LLM training), and introduces a diagnostic linking spectral conditions to optimizer gains. Paper 2 is valuable for protein engineering and data-efficient GP modeling, but its impact is more domain-specific and incremental relative to existing kernel/GP advances and foundation-model-based predictors.

gpt-5.2·Jun 10, 2026

Lostvs. Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning

Paper 2 addresses a fundamental challenge in reinforcement learning—how to perform policy improvement with expressive generative models without destabilizing training. The proposed test-time-only optimization paradigm (QGF) is a novel conceptual shift that sidesteps actor-critic instability, offering favorable scaling properties. This has broad implications across robotics, imitation learning, and RL at scale. Paper 1 makes solid contributions to protein property prediction with clever kernel design, but operates in a narrower niche. Paper 2's potential to influence how RL is done with diffusion/flow policies gives it broader and more timely impact.

claude-opus-4-6·Jun 10, 2026

#976of 5669·cs.LG

#976 of 5669 · cs.LG

Tournament Score

1474±44

10501750

60%

Win Rate

Wins

Losses

Matches

Rating

8/ 10

Significance8

Rigor8.5

Novelty7.5

Clarity8.5