Dongxin Lyu, Jingbo Zhou, Hongxin Xiang, Yuqiang Li, Jun Xia
De novo peptide sequencing from tandem mass spectrometry is pivotal in proteomics, enabling identification of novel peptides without reference databases. While recent Transformer-based encoder-decoder models have achieved remarkable performance, we uncover a critical pathology in their inference dynamics. Through comprehensive feature scaling experiments, we demonstrate that existing auto-regressive peptide decoders tend to over-rely on generated-sequence priors while progressively under-utilizing fine-grained physical evidence from the input mass spectrum. This phenomenon leads to suboptimal results, where generated peptide sequences are biologically plausible yet not faithful to the input spectrum. To rectify this, we propose MemNovo, a training-free and plug-and-play mechanism that re-balances peptide and spectral contributions at inference time. MemNovo alleviates the information bottleneck by establishing a persistent spectral memory bank and injecting retrieved features directly into the final decoding stage via an ultra-conservative residual connection. Theoretical analysis confirms that this mechanism restores the mutual information between the decoder state and the raw spectrum. Extensive experiments on the Nine Species benchmark with two representative baselines, Casanovo and InstaNovo, demonstrate that MemNovo consistently improves both amino acid precision and peptide precision, achieving up to 39.1% relative improvement in peptide precision for Casanovo and up to 3.9% for InstaNovo, with negligible computational overhead.
MemNovo identifies and addresses a previously uncharacterized pathology in Transformer-based de novo peptide sequencing models: sensitivity imbalance, where autoregressive decoders progressively over-rely on peptide sequence priors while under-utilizing the physical evidence from mass spectra. The paper makes two linked contributions: (1) a Sensitivity Scaling Framework diagnostic tool that quantifies this imbalance by perturbing feature magnitudes at inference time, and (2) a training-free, plug-and-play memory re-injection mechanism that caches encoder outputs in a persistent memory bank and injects them into the final decoder layer via ultra-conservative residual connections (α = 0.005).
The core insight—that autoregressive decoders in scientific domains may favor "linguistically plausible" outputs over physically grounded ones—is both intuitive and well-demonstrated. The finding that Casanovo exhibits a 15.4× sensitivity ratio between peptide and spectrum inputs is striking and actionable.
Within proteomics: The plug-and-play, training-free nature of MemNovo makes it immediately deployable with existing pre-trained models, lowering adoption barriers. The near-zero computational overhead (~1% latency increase) is practically important for large-scale proteomics pipelines. The case studies showing correction of near-isobaric mass confusions (deamidation, acetylation) address genuine pain points in PTM identification.
Broader ML implications: The sensitivity scaling framework could serve as a general diagnostic for multimodal encoder-decoder systems where fidelity to physical inputs is critical. The concept of "spectral under-utilization" may resonate in other scientific domains (e.g., molecular generation from spectroscopy, materials design) where models might over-rely on learned priors over experimental data. However, the specific mechanism (projection-free cross-attention with tiny α) may be too domain-specific to transfer directly.
Limitations on impact: The gains on the stronger baseline (InstaNovo) are modest (+3.9% peptide precision), suggesting the problem may diminish as models improve. The method is inherently a post-hoc patch rather than a principled architectural solution, which may limit its long-term relevance.
The paper is well-timed. De novo peptide sequencing is experiencing rapid growth with multiple competing Transformer-based approaches. The identification of a systematic failure mode across these architectures fills a genuine gap—most prior work has focused on training strategies and architectural innovations rather than inference-time dynamics. The growing interest in inference-time enhancement (prompted by successes in LLMs) makes this work topically relevant.
MemNovo presents a clean, well-motivated contribution that identifies a real pathology in de novo peptide sequencing models and offers a practical fix. The diagnostic framework is the more lasting contribution, while the specific mechanism is an effective but potentially transient remedy. The work is methodologically sound with minor theoretical over-claims. Impact is moderate: immediately useful for practitioners using Casanovo-class models, but of diminishing value as baseline models improve.
Generated Jun 11, 2026
Paper 1 addresses a fundamental methodological issue in de novo peptide sequencing—a core proteomics problem—with rigorous theoretical analysis (mutual information restoration) and strong empirical results (up to 39.1% improvement). Its training-free, plug-and-play nature makes it broadly applicable to existing Transformer-based models. Paper 2 presents a useful engineering contribution for coding agents but addresses a narrower usability concern with less fundamental scientific depth. Paper 1's impact spans computational biology and machine learning, offering deeper methodological insights with broader scientific implications.
Paper 2 addresses a critical bottleneck in proteomics by improving de novo peptide sequencing. Its training-free, plug-and-play approach yields massive performance gains (up to 39.1%), offering immediate, high-impact applications in biological research and drug discovery. While Paper 1 provides rigorous theoretical advancements in machine learning, Paper 2 demonstrates more immediate real-world scientific utility.
DYSCO addresses a fundamental problem spanning representation learning, system identification, and scientific discovery with strong theoretical guarantees and broad applicability. Its ability to recover governing equations from noisy high-dimensional data has transformative potential across physics, neuroscience, and engineering. Paper 2, while practically useful for proteomics, presents an incremental inference-time fix (training-free plug-and-play) for existing models in a narrower domain. Paper 1's novelty in combining contrastive learning with symbolic equation recovery and its theoretical identifiability results give it broader and deeper scientific impact.
Paper 2 has higher potential scientific impact due to its profound implications for proteomics and drug discovery. Improving de novo peptide sequencing precision by up to 39% directly accelerates biological research and therapeutic development. While Paper 1 offers a valuable advancement in 3D motion generation from 2D data, Paper 2 addresses a critical bottleneck in an interdisciplinary field (AI for biology) where algorithmic improvements translate to significant real-world health and scientific breakthroughs.
Paper 2 likely has higher scientific impact: it addresses a well-defined, widely relevant failure mode in Transformer autoregressive decoding (over-reliance on priors vs evidence), proposes a training-free, plug-and-play inference mechanism with theoretical mutual-information justification, and shows substantial gains on a standard proteomics benchmark with negligible overhead—supporting methodological rigor and immediate adoption. Its idea may generalize beyond de novo sequencing to other evidence-conditioned generation tasks. Paper 1 offers strong systems-level efficiency/scalability, but impact depends on broader validation across diverse interacting-forecasting domains and clearer novelty relative to existing multi-task/state-space approaches.
MemNovo addresses a fundamental pathology in Transformer-based de novo peptide sequencing—over-reliance on sequence priors at the expense of spectral evidence—and proposes a training-free, plug-and-play solution with strong empirical gains (up to 39.1% relative improvement). This has direct real-world impact in proteomics, a field with broad biomedical applications. The insight about decoder attention drift may generalize to other encoder-decoder tasks. Paper 1 provides useful empirical analysis of on-policy distillation but is primarily descriptive/analytical with narrower practical implications.
Paper 2 identifies a fundamental pathology in state-of-the-art transformer models for proteomics and proposes a highly novel, training-free solution with massive performance improvements (up to 39.1%). Its foundational contribution to computational mass spectrometry offers broader applicability and higher methodological innovation compared to Paper 1, which, while rigorous and clinically relevant, represents a more standard application of existing multimodal machine learning techniques.
Paper 2 likely has higher near-term scientific impact: it identifies a concrete, broadly relevant inference pathology in Transformer decoders (over-reliance on priors vs. input evidence) and proposes a training-free, plug-and-play fix with strong empirical gains on standard proteomics benchmarks and minimal overhead—high application value and methodological rigor. Paper 1 is ambitious and potentially transformative, but its impact depends on community adoption and validation of a broad theoretical framework; such general theories often face slower uptake and harder empirical falsification.
Paper 2 (MemNovo) has higher estimated impact due to a clearer methodological insight (diagnosing an inference-time pathology in autoregressive decoders) paired with a general, training-free, plug-and-play fix with theoretical backing (mutual-information restoration) and strong benchmark gains in a core proteomics task. De novo peptide sequencing has immediate real-world utility in proteomics, immunopeptidomics, and biotech, and the proposed mechanism could transfer to other spectrum-to-sequence problems. Paper 1 is useful and practical, but RAG-style augmentation is less conceptually novel and more domain-specific.
Paper 1 likely has higher scientific impact due to broader applicability and timeliness: incomplete multimodal time-series is pervasive across healthcare sensing, wearables, and general multimodal ML. PAMF’s unified handling of within-modality and modality-level missingness, plus coupling imputation with downstream prediction via prior-aware flow matching and weight sharing, is a methodological contribution that can transfer to many tasks. Paper 2 is strong and practical but more niche (de novo peptide sequencing) and focuses on an inference-time add-on for existing decoders, limiting breadth despite clear gains in proteomics.