MIMIC: A Generative Multimodal Foundation Model for Biomolecules

Siavash Golkar, Jake Kovalic, Irina Espejo Morales, Samuel Sledzieski, Minhuan Li, Ksenia Sokolova, Geraud Krawezik, Alberto Bietti

Apr 27, 2026arXiv:2604.24506v1

cs.AIcs.LG

#3of 4479·Artificial Intelligence

Bronze · Week 18, 2026

Tournament Score

1686±25

10501800

97%

Win Rate

182

Wins

Losses

188

Matches

Rating

8/ 10

Significance8.5

Rigor7

Novelty8

Clarity7

Abstract

Biological function emerges from coupled constraints across sequence, structure, regulation, evolution, and cellular context, yet most foundation models in biology are trained within one modality or for a fixed forward task. We present MIMIC, a generative multimodal foundation model trained on our newly curated and aligned dataset, LORE, linking nucleic acid, protein, evolutionary, structural, regulatory, and semantic/contextual modalities within partially observed biomolecular states. MIMIC uses a split-track encoder-decoder architecture to condition on arbitrary subsets of observed modalities and reconstruct or generate missing components of molecular state across the genome, transcriptome, and proteome. Multimodal conditioning consistently improves MIMIC's sequence reconstruction relative to sequence-only inputs, while its learned representations enable state-of-the-art performance on RNA and protein downstream tasks. MIMIC achieves state-of-the-art splicing prediction, and its joint generative formulation enables isoform-aware inference that further improves performance. Beyond prediction, the same generative framework supports constrained design. For RNA, MIMIC identifies corrective edits in a clinically relevant HBB splice-disrupting mutation without reverting it by using evolutionary and structural signals. For proteins, jointly conditioning on shape and surface chemistry of PD-L1 and hACE2 binding sites produces diverse, high-confidence sequences with strong in silico support for target binding. Finally, MIMIC uses experimental context as semantic conditioning to model assay-dependent RNA chemical probing, rather than treating context as a fixed output. Together, these results position MIMIC's aligned multimodal generative modeling as a strong foundation for unifying representation learning, conditional prediction, and constrained biomolecular design within a single model.

AI Impact Assessments

(3 models)

Scientific Impact Assessment: MIMIC – A Generative Multimodal Foundation Model for Biomolecules

1. Core Contribution

MIMIC introduces a unified generative foundation model that jointly represents and reasons across heterogeneous molecular biology data spanning the central dogma — from DNA/RNA sequences through protein structures. The key intellectual contribution is the reframing of biological foundation modeling from siloed, forward-prediction tasks toward approximating the *joint distribution* of molecular states across modalities. This is accompanied by LORE, a curated multimodal dataset aligning ~15.5M proteins and ~13M RNA transcripts with evolutionary, structural, regulatory, and semantic/contextual annotations.

The split-track encoder-decoder architecture is the central architectural innovation: co-aligned features (e.g., phyloP conservation aligned to nucleotide positions) are summed within tracks, while distinct molecular entities are concatenated across tracks. This keeps sequence length tractable while preserving positional alignment. The asymmetric design (10K encoder, 1K decoder context) allows conditioning on broad molecular context while generating focused outputs.

2. Methodological Rigor

Architecture and training: The split-track summation, RoPE with group-reset, heuristic pathway sampling (~25 pathways), and staged curriculum (1K→10K tokens) are well-motivated engineering choices. The register token mechanism with random dropout for self-supervision is clever, though the paper could better analyze what information registers actually encode.

Evaluation breadth: The paper evaluates across an impressive range of tasks — sequence completion, protein benchmarks (PFMBench, 11 tasks), RNA benchmarks (mRNABench, 7 tasks), variant effect prediction, splice prediction, RNA design, protein design, and context-conditioned chemical probing. State-of-the-art claims are supported by comparison to strong baselines (ESM3, ESM-C, Evo2, AlphaGenome, SpliceAI, Orthrus).

Potential concerns: Several evaluation design choices warrant scrutiny. For splice prediction, all models are evaluated on MIMIC's test set, not necessarily held-out for other models — this could slightly disadvantage competitors. The protein design evaluation relies heavily on in silico metrics (AlphaFold2 pLDDT, AlphaFold3 iPTM) without experimental validation. For hACE2, only 2/20 high-confidence designs achieved iPTM > 0.75, which is modest. The RNA design experiment (HBB splice correction) uses SpliceAI as oracle but lacks experimental validation. The ablation study (Figure S3-S4) nicely demonstrates that multimodal conditioning helps most when auxiliary modalities are "ambiguous" — partially predictable from sequence — which is an insightful mechanistic finding.

3. Potential Impact

Unification paradigm: If MIMIC's approach scales, it could fundamentally change how the field approaches biological foundation models — moving from task-specific architectures to unified multimodal generative models. The ability to condition on arbitrary subsets of modalities and generate missing ones is a powerful abstraction.

Dataset contribution: LORE addresses a genuine infrastructure gap. Aligned multimodal biological datasets are scarce, and the community would significantly benefit from its release (currently pending).

Design applications: The constrained design demonstrations — RNA splice correction and protein binder design — showcase the inverse-problem capability that distinguishes this work from forward-prediction models. The HBB splice correction example, where MIMIC identifies compensatory edits without reverting the pathogenic mutation, is particularly compelling conceptually.

Variant effect prediction: The phyloP VEP approach, using a two-dimensional feature space (wild-type constraint + mutation-induced perturbation), dramatically outperforms embedding-based approaches (+82% on complex variants). This interpretable, low-dimensional approach could have immediate practical utility.

4. Timeliness & Relevance

This work addresses a clear bottleneck: biological AI remains fragmented across modalities, with most models treating biology as a collection of independent prediction problems. The multimodal approach aligns with the biological reality that function emerges from coupled constraints. The paper arrives at a moment when the field is saturated with single-modality models and recognizing their limitations. The semantic conditioning — using natural language descriptions of experimental context — is particularly timely given the growing recognition that biological measurements are context-dependent.

5. Strengths & Limitations

Key strengths:

Breadth of evaluation across diverse tasks with consistent strong performance at 1B parameters

The LORE dataset fills a genuine gap in aligned multimodal biological data

Multimodal conditioning ablations provide mechanistic insight, not just performance numbers

The phyloP VEP demonstrates that generative outputs can be directly task-relevant, bypassing the embedding→probe bottleneck

Context-conditioned RASP2 prediction demonstrates principled handling of experimental variability

Notable limitations:

No experimental validation for any design task — all evaluations are computational

The 1K decoder window significantly limits generation length, problematic for many biological applications

LORE coverage is uneven; many important biological modalities (chromatin conformation, protein-protein interactions, post-translational modifications) are absent

The heuristic pathway sampling with ~25 manually defined pathways may not scale gracefully

Model and dataset are not yet released, preventing independent verification

The paper is extremely dense, making independent assessment of individual claims challenging

Protein design results for hACE2 are weak (2/20 predicted binders), though honestly reported

Scalability questions: At 1B parameters, MIMIC is relatively modest. Whether the multimodal approach continues to outperform scaling-focused single-modality models at larger sizes remains an open question. The authors argue for "diverse data over pure scaling," but this hypothesis needs testing at scale.

Summary

MIMIC represents an ambitious and largely successful attempt to unify biological foundation modeling across the central dogma. Its strongest contributions are conceptual (the joint distribution framing), architectural (split-track encoder-decoder), and empirical (consistent SOTA across diverse benchmarks). The lack of experimental validation for design tasks and the pending release of code/data are significant limitations for assessing real-world impact. Nevertheless, the work establishes a compelling paradigm that could reshape how the field builds and evaluates biological AI systems.

Rating:8/ 10

Significance 8.5Rigor 7Novelty 8Clarity 7

Generated Apr 28, 2026

Comparison History (188)

Wonvs. STOCKTAKE: Measuring the Gap Between Perception and Action in LLM Agents with a Fair Oracle

Paper 2 has higher potential impact because it introduces a foundational multimodal generative model unifying sequence, structure, and evolution for biomolecules. This approach broadly advances computational biology, genomics, and drug discovery. While Paper 1 offers a rigorous, timely benchmark for LLM agents, its impact is primarily confined to AI evaluation and operations research. In contrast, Paper 2's capacity for clinical mutation correction and novel therapeutic protein design targets profound real-world medical applications, representing a paradigm shift in biomolecular design with a significantly wider scientific footprint.

gemini-3.1-pro-preview·Jul 16, 2026

Wonvs. Attention Limited Reward Learning

MIMIC presents a generative multimodal foundation model unifying sequence, structure, regulatory, evolutionary, and contextual modalities for biomolecules—a significant architectural and methodological advance with broad applications in drug design, RNA editing, protein engineering, and splicing prediction. It achieves state-of-the-art on multiple benchmarks and demonstrates practical design capabilities (HBB mutation correction, PD-L1/hACE2 binder design). While Paper 1 offers valuable theoretical insights into RLHF limitations via rational inattention, its impact is more narrowly focused on AI alignment methodology. Paper 2's breadth across biology and medicine, combined with its novel multimodal framework, suggests higher overall scientific impact.

claude-opus-4-6·Jul 7, 2026

Wonvs. World-Model Collapse as a Phase Transition

While Paper 1 offers a fascinating theoretical diagnostic for AI agents, Paper 2 demonstrates profound real-world applications in biomedicine and drug discovery. By unifying multiple biological modalities into a single generative foundation model, MIMIC achieves state-of-the-art performance in clinically relevant tasks, such as correcting splice-disrupting mutations and designing protein binders. Its breadth of impact across genomics, proteomics, AI, and medicine, combined with its ability to perform both predictive and generative biomolecular design, gives it a significantly higher potential for transformative scientific and societal impact.

gemini-3.1-pro-preview·Jul 1, 2026

Wonvs. Socratic agents for autonomous scientific discovery in high-dimensional physical systems

Paper 1 (MIMIC) unifies biological modalities into a single generative foundation model, demonstrating profound real-world applications in clinical genetics and drug discovery. Its ability to identify corrective RNA edits and design protein binders for critical targets provides immediate, transformative value across the life sciences. While Paper 2 offers an innovative multi-agent framework for autonomous physical discovery, Paper 1's broader applicability across biology, medicine, and bioinformatics, coupled with its direct implications for human health, gives it a significantly higher potential for broad scientific and societal impact.

gemini-3.1-pro-preview·Jun 26, 2026

Wonvs. Solving Inverse Problems of Chaotic Systems with Bidirectional Conditional Flow Matching

Paper 1 likely has higher impact due to greater breadth and immediacy of real-world applications: a multimodal generative foundation model spanning RNA/protein sequence, structure, regulation, evolution, and context, enabling SOTA prediction plus constrained biomolecular design (clinically relevant splicing repair, binding-site–conditioned protein design). Its novelty is amplified by a curated aligned dataset and arbitrary-subset conditioning, positioning it as reusable infrastructure across genomics, proteomics, and drug discovery. Paper 2 is methodologically strong and timely for inverse chaotic dynamics, but its cross-domain uptake is narrower and may remain specialized.

gpt-5.2·Jun 24, 2026

Wonvs. Reinforcement Learning for Computer-Use Agents with Autonomous Evaluation

Paper 1 has higher potential scientific impact as it introduces a unifying multimodal foundation model for biology, a field where such models consistently drive massive paradigm shifts. By integrating sequence, structure, and evolutionary data, MIMIC enables both state-of-the-art prediction and constrained generative design for direct clinical applications, such as correcting RNA mutations and designing target-binding proteins. While Paper 2 offers a valuable methodological advance for AI computer-use agents, Paper 1's capacity to accelerate drug discovery and fundamentally advance computational life sciences presents a much more profound real-world impact across medicine and biology.

gemini-3.1-pro-preview·Jun 24, 2026

Wonvs. OpenThoughts-Agent: Data Recipes for Agentic Models

Paper 2 (MIMIC) is likely higher impact: it introduces a genuinely multimodal generative foundation model spanning sequence, structure, regulation, evolution, and context, enabling both prediction and constrained biomolecular design—clear real-world relevance to therapeutics and functional genomics. The aligned LORE dataset and split-track architecture broaden applicability across biology and ML, and reported state-of-the-art results plus concrete clinical/design case studies suggest strong utility. Paper 1 is valuable and rigorous for open agent data curation, but its contributions are more incremental and narrower to agent benchmark performance.

gpt-5.2·Jun 24, 2026

Wonvs. SPADE: Structure-Prior Adaptive Decision Estimation

Paper 2 presents a multimodal foundation model for biology with immense potential for real-world impact in drug discovery, therapeutics, and biomolecular design. While Paper 1 offers elegant theoretical advancements in scientific machine learning, Paper 2's ability to unify genomics, transcriptomics, and proteomics to generate clinically relevant edits and achieve SOTA predictive capabilities aligns with highly timely, transformative trends in AI-driven biology, granting it broader cross-disciplinary applicability and higher anticipated scientific impact.

gemini-3.1-pro-preview·Jun 23, 2026

Wonvs. DART: Draft-Agreement Routing for Training-Free Adaptive Thinking Budgets in Hybrid Reasoning Models

MIMIC presents a fundamentally new multimodal foundation model for biomolecules that unifies sequence, structure, evolution, regulation, and context across nucleic acids and proteins—a significantly broader scientific contribution. It introduces a novel dataset (LORE), achieves SOTA on multiple downstream tasks, and demonstrates practical applications in RNA editing and protein design with clinical relevance (HBB mutations, PD-L1 binding). While DART is a solid engineering contribution for efficient LLM inference routing, MIMIC addresses a deeper scientific problem with broader impact across biology, drug design, and genomics, and represents a more transformative methodological advance.

claude-opus-4-6·Jun 23, 2026

Wonvs. Apparent Psychological Profiles of Large Language Models are Largely a Measurement Artifact

MIMIC presents a novel generative multimodal foundation model that unifies sequence, structure, evolutionary, regulatory, and contextual modalities for biomolecules, achieving state-of-the-art results across multiple tasks and enabling constrained molecular design. Its breadth of applications—from splicing prediction to protein design to RNA editing—gives it enormous potential real-world impact in drug discovery and synthetic biology. While Paper 1 makes an important methodological contribution by exposing measurement artifacts in LLM psychological profiling, its impact is more narrowly corrective, cautioning against a specific practice rather than opening new research directions.

claude-opus-4-6·Jun 19, 2026

#3of 4479·Artificial Intelligence

Bronze · Week 18, 2026

Tournament Score

1686±25

10501800

97%

Win Rate

182

Wins

Losses

188

Matches

Rating

8/ 10

Significance8.5

Rigor7

Novelty8

Clarity7