Few-step Cofolding with All-Atom Flow Maps

Gianluca Scarpellini, Ron Shprints, Peter Holderrieth, Juno Nam, Pranav Murugan, Rafael Gómez-Bombarelli, Tommi Jaakola, Maruan Al-Shedivat

Jun 7, 2026arXiv:2606.08375v1

cs.LG

#329of 5669·cs.LG

#329 of 5669 · cs.LG

Tournament Score

1520±43

10501750

76%

Win Rate

Wins

Losses

Matches

Rating

7.8/ 10

Significance8

Rigor8

Novelty7

Clarity7.5

Abstract

All-atom generative modeling of 3D biomolecular complexes has emerged as the dominant paradigm for predicting the structure of proteins and protein-ligand systems. Generating structures at the atomic level of fidelity, however, typically requires expensive iterative diffusion rollouts, making both conventional deployment and inference-time search techniques computationally costly. In this paper, we introduce the Denoiser Cofolding All-Atom Flowmap (DeCAF) framework for distilling state-of-the-art all-atom cofolding models into all-atom flow maps that produce high-quality samples in only a few inference steps. We build DeCAF on a denoiser-based formulation of flow maps with endpoint losses that naturally support SE(3) rigid alignment, which we show is critical for training accurate models. We further derive a simple change of variables that lets DeCAF operate in the σ-space noise schedule of EDM-style architectures, enabling direct distillation from pretrained cofolding diffusion models. Equipped with DeCAF's flowmap lookahead, we introduce a purpose-built inference-time framework that improves sampling through reward-guided search. Empirically, DeCAF-Boltz statistically improves over Boltz-1x in both accuracy (RMSD) and physical validity scores of protein-ligand poses at strict NFE budgets on the challenging Runs N' Poses, while also showing a more optimal Pareto frontier across all inference compute budgets on PoseBusters. Distilling the state-of-the-art Pearl cofolding model, DeCAF-Pearl outperforms diffusion-based cofolding models and matches its teacher on success rate while using 5x fewer NFEs. We release our code at https://github.com/genesistherapeutics/decaf.

AI Impact Assessments

(1 models)

Scientific Impact Assessment: Few-step Cofolding with All-Atom Flow Maps (DeCAF)

1. Core Contribution

DeCAF introduces a framework for distilling pretrained all-atom biomolecular cofolding diffusion models (specifically Boltz-1 and Pearl) into flow maps that generate high-quality protein-ligand structures in dramatically fewer inference steps (5-20× reduction in neural function evaluations). The paper makes three intertwined contributions:

First, a σ-space reparameterization of flow maps that eliminates the numerical instability arising from the chain rule through EDM noise schedules (avoiding the problematic ∂σ/∂t factor). This is paired with a denoiser-based parametrization that enables SE(3) rigid alignment via the Kabsch algorithm on predicted endpoints—shown to be critical for training stability through ablation (Table 3, where velocity and consistency distillation parametrizations catastrophically fail).

Second, DeCAF-SEARCH, an inference-time search framework that exploits the flow map's lookahead capability to evaluate terminal rewards (physical validity) on clean-space predictions rather than noisy intermediate states, unifying FK-steering, SMC resampling, and MCTS-style exploration under one umbrella.

Third, empirical validation on two challenging benchmarks (Runs N' Poses and PoseBusters) showing that DeCAF-Boltz matches full-budget Boltz-1x (600 NFE) with 20× fewer evaluations, and DeCAF-Pearl matches Pearl's success rate with 5× fewer NFEs.

2. Methodological Rigor

The technical derivations are sound. The σ-space reformulation (Eq. 6-7) is a clean change of variables, and the connection between the two-time denoiser and the mean-flow/Eulerian objective (Eq. 8-10) is mathematically well-grounded. The endpoint loss with SE(3) alignment (Eq. 11) is a natural extension of standard practices in biomolecular diffusion models.

The experimental design is thorough:

Statistical testing: Paired Wilcoxon signed-rank tests with significance levels reported (p < 0.001 on most comparisons), which is commendable and uncommon in this literature.

Fair comparisons: The authors share architecture, training data, and pretraining with Boltz-1, isolating the effect of the distillation framework. They also contribute a tuned Boltz-1x baseline (η=1.0 for few-step regimes) rather than comparing against a strawman.

Generalization analysis: Figure 2 stratifies performance by training-set similarity (PLI Q-Coverage), showing gains hold across all similarity quartiles including the most OOD bin.

Comprehensive ablations: The parametrization ablation (Table 3) decisively demonstrates the necessity of the x₀-aligned formulation. The γ-sweep (Table 6) and step-count analysis (Figure 6) add further rigor.

One limitation is that the training compute (100 epochs on 64 H200 GPUs) is substantial, though this is a one-time cost. The paper does not report training time explicitly, making cost-benefit analysis difficult.

3. Potential Impact

The practical impact is significant across several dimensions:

Drug discovery workflows: A 5-20× reduction in inference cost for cofolding directly enables virtual screening of larger ligand libraries against protein targets—a key bottleneck in computational drug discovery. The paper explicitly notes this (Section 4.1), and the numbers are compelling enough to change deployment practices.

Synthetic data generation: Faster cofolding enables generating orders of magnitude more protein-ligand complexes for training downstream scoring and affinity models, addressing a critical data bottleneck.

Inference-time search: The DeCAF-SEARCH framework provides a principled way to combine flow map lookahead with reward-guided search, which could generalize beyond cofolding to other structured prediction tasks. The observation that different search strategies (FK, MC-GRAD, MCTS) are optimal at different compute budgets (Figure 4) provides actionable guidance.

Broader methodological impact: The σ-space reparameterization for EDM-style architectures is general and could facilitate flow map distillation in other EDM-based domains. The denoiser parametrization insight (that velocity-based losses are incompatible with SE(3) alignment because subtracting translation loses a degree of freedom) is a non-obvious but important technical contribution.

4. Timeliness & Relevance

This paper addresses an acute bottleneck. The all-atom cofolding paradigm (AF3, Boltz, Chai, Pearl) has become the dominant approach for biomolecular structure prediction, but inference costs of O(200) NFEs per sample remain prohibitive for production-scale applications. The recent explosion of inference-time scaling methods (FK steering, MCTS) compounds this cost. DeCAF directly targets this pain point at precisely the right moment—when the community is transitioning from "can we predict structures?" to "can we predict them efficiently enough for real-world deployment?"

The concurrent work DCFold (closed-source) validates the importance of this problem, but DeCAF's open-source release and applicability to multiple teacher models (Boltz-1, Pearl) gives it broader reach.

5. Strengths & Limitations

Key Strengths:

The σ-space formulation elegantly solves a real numerical problem, not an artificial one

SE(3) rigid alignment through denoiser parametrization is a crucial insight validated by devastating ablation results

State-of-the-art results with Pearl distillation (DeCAF-Pearl) demonstrate framework generality

Open-source code release maximizes community impact

The Pareto frontier analysis (Figure 4) provides a complete picture rather than cherry-picked operating points

The non-monotone NFE-success relationship (Table 7) is honestly reported and suggests future work

Notable Limitations:

Reward function design is inherited from Boltz-1x; systematic sp2 planarity failures persist across all methods

The paper focuses on protein-ligand cofolding; extension to nucleic acids, multi-chain systems, and covalent modifications is acknowledged but untested

Training requires substantial compute (64 H200 GPUs), potentially limiting reproducibility for academic groups

The non-monotone relationship between NFE and per-target success (Table 7, where successful complexes at high NFE are not a superset of low-NFE successes) suggests the flow map may introduce artifacts not present in the teacher

DeCAF-Pearl uses a proprietary teacher model, limiting full reproducibility of those specific results

Overall Assessment

DeCAF represents a well-executed and timely contribution that solves a genuine computational bottleneck in biomolecular structure prediction. The technical innovations (σ-space reparameterization, denoiser-based SE(3)-compatible flow maps) are clean and well-validated, and the empirical results are strong, statistically rigorous, and practically meaningful. The framework's generality across teacher models and the open-source release amplify its potential impact.

Rating:7.8/ 10

Significance 8Rigor 8Novelty 7Clarity 7.5

Generated Jun 9, 2026

Comparison History (17)

Wonvs. K-Forcing: Joint Next-K-Token Decoding via Push-Forward Language Modeling

Paper 1 has higher potential scientific impact. It advances all-atom biomolecular complex generation by distilling expensive diffusion cofolding into few-step flow maps, adding SE(3)-aware training and an EDM-noise change-of-variables, plus reward-guided search. The method is both novel and rigorous, shows strong empirical gains on challenging structural benchmarks, and targets high-value real-world applications (drug discovery, protein–ligand docking) where compute cost is a major bottleneck. While Paper 2 is timely and useful for LLM serving efficiency, its impact is narrower and primarily incremental for deployment speed/quality tradeoffs.

gpt-5.2·Jun 10, 2026

Wonvs. TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning

Paper 2 addresses a critical bottleneck in computational biology and drug discovery by significantly reducing the computational cost of all-atom generative modeling for protein-ligand complexes. Achieving state-of-the-art accuracy with 5x fewer inference steps enables scalable deployment and accelerates biological research. While Paper 1 offers a valuable efficiency improvement for LLM agents, Paper 2's direct application to accelerating biomolecular structure prediction promises broader and more immediate real-world scientific impact across life sciences and medicine.

gemini-3.1-pro-preview·Jun 10, 2026

Lostvs. Perturbative Contrastive Physical Learning

Paper 2 introduces a foundational framework for training physical systems without centralized backpropagation, bridging physics and AI. This offers transformative potential for developing efficient, autonomous neuromorphic hardware. While Paper 1 provides a highly valuable and practical algorithmic acceleration for molecular modeling, Paper 2's theoretical innovation and potential to fundamentally change how physical AI systems are trained give it a broader, longer-term scientific impact across multiple disciplines.

gemini-3.1-pro-preview·Jun 9, 2026

Wonvs. Preserving Plasticity in Continual Learning via Dynamical Isometry

Paper 1 addresses a critical computational bottleneck in biomolecular generative modeling. By achieving a 5x speedup in protein-ligand cofolding without sacrificing accuracy, it has immediate, high-value applications in drug discovery and structural biology. While Paper 2 offers strong theoretical insights for continual learning, Paper 1's empirical breakthrough in a highly impactful applied field gives it a more tangible and immediate scientific and real-world impact.

gemini-3.1-pro-preview·Jun 9, 2026

Wonvs. STAR-KV: Low-Rank KV Cache Compression via Soft Thresholding for Adaptive Rank Control

Paper 1 likely has higher scientific impact: it advances all-atom biomolecular generative modeling by distilling diffusion cofolding into few-step flow maps, reducing inference cost while maintaining (and sometimes improving) accuracy/validity. This directly enables broader deployment and more powerful inference-time search in drug discovery and structural biology—high-value real-world applications with cross-field relevance (ML, chemistry, biophysics). The methodological contributions (SE(3)-aware endpoint losses, σ-space change of variables, reward-guided sampling) are substantial and timely given the centrality of diffusion-based structure modeling. Paper 2 is impactful for LLM efficiency, but is more incremental within KV-compression literature.

gpt-5.2·Jun 9, 2026

Wonvs. Sign Lock-In: Randomly Initialized Weight Signs Persist and Bottleneck Sub-Bit Model Compression

Paper 1 (DeCAF) addresses a critical computational bottleneck in protein-ligand structure prediction—a high-impact application in drug discovery. It demonstrates practical improvements over state-of-the-art models (Boltz-1, Pearl) with 5x fewer inference steps, combining flow map distillation with inference-time search. The immediate applicability to drug design and structural biology gives it broader real-world impact. Paper 2 offers interesting theoretical insights on sign lock-in during training and sub-bit compression, but addresses a narrower problem with less immediate practical significance compared to accelerating biomolecular structure prediction.

claude-opus-4-6·Jun 9, 2026

Lostvs. An Information-Theoretic Definition for Open-Ended Learning

Paper 2 is likely to have higher scientific impact because it proposes a general, information-theoretic definition of “open-endedness” (bit-equivalent) and connects it to provable growth conditions and algorithms. This is a foundational contribution that can influence multiple areas (RL, exploration, lifelong learning, AI safety/AGI discussions) and is timely given current interest in open-ended agents. Paper 1 is strong and practically valuable for biomolecular modeling efficiency, but it is a more specialized, incremental/distillation-focused advance within an already fast-moving application domain.

gpt-5.2·Jun 9, 2026

Wonvs. Breaking the Scale Barrier: One-Shot Knowledge Transfer via Frequency Transform

Paper 2 addresses a critical bottleneck in computational structural biology—the high cost of diffusion-based all-atom biomolecular structure prediction—with a principled distillation framework (DeCAF) that achieves comparable or better accuracy in far fewer inference steps. This has immediate, high-impact applications in drug discovery and protein engineering, fields with enormous practical significance. While Paper 1 presents a creative frequency-domain approach to cross-scale knowledge transfer, its contributions are more incremental within the well-explored transfer learning space. Paper 2's novel SE(3)-aware flow map distillation and inference-time search framework represent deeper methodological contributions with broader real-world impact.

claude-opus-4-6·Jun 9, 2026

Wonvs. Closed-Form Spectral Regularization for Multi-Task Model Merging

While Paper 1 offers impressive computational speedups for AI model merging, Paper 2 tackles the critical bottleneck of inference speed in 3D biomolecular complex generation. Accelerating all-atom cofolding by 5x while maintaining or improving accuracy directly impacts real-world drug discovery and computational biology. The ability to perform rapid inference-time search in structural biology has profound implications for biotechnology and medicine, giving it a slightly higher potential for broader scientific and societal impact.

gemini-3.1-pro-preview·Jun 9, 2026

Wonvs. Quantifying and Mitigating Self-Preference Bias of LLM Judges

Paper 2 (DeCAF) addresses a fundamental computational bottleneck in biomolecular structure prediction—expensive diffusion rollouts—by introducing a novel flow map distillation framework that achieves comparable or better accuracy with 5x fewer inference steps. This has immediate, high-impact applications in drug discovery and protein engineering. The methodological contributions (SE(3)-equivariant flow maps, reward-guided inference-time search) are broadly applicable across generative modeling. Paper 1 addresses an important but narrower problem (LLM judge bias) with a useful but more incremental contribution. Paper 2's combination of methodological novelty, practical speedups, and applicability to drug discovery gives it broader and deeper scientific impact.

claude-opus-4-6·Jun 9, 2026

#329of 5669·cs.LG

#329 of 5669 · cs.LG

Tournament Score

1520±43

10501750

76%

Win Rate

Wins

Losses

Matches

Rating

7.8/ 10

Significance8

Rigor8

Novelty7

Clarity7.5