Guangyu Wang, Jingkun Yue, Siqi Zhang, Yu Liu, Xiaoyu Wang, Mingyuan Meng, Changwei Ji, Zongbo Han
A central goal of biomedicine is to understand, predict and ultimately control the dynamic mechanisms by which biological systems respond to perturbations, disease progression and therapeutic intervention. Although foundation models and large language models have accelerated biomedical data interpretation, most current systems remain focused on static pattern recognition rather than prospective simulation of biological futures. Here we propose biomedical world models as a paradigm for AI-driven discovery. These models learn latent representations of molecular, cellular, tissue and clinical states, together with intervention-conditioned dynamics that allow future trajectories to be simulated before actions are taken. We discuss how biomedical world models could function as data engines, environment simulators and scientific planning substrates across applications including virtual cells, organoids, virtual patients and surgical simulation. We outline the data infrastructure, evaluation benchmarks, safety constraints and governance frameworks required. Biomedical world models may provide a foundation for simulation-guided, closed-loop and experimentally actionable biomedical discovery.
This paper proposes "biomedical world models" as a conceptual paradigm for AI-driven biomedical discovery. The central idea is to move beyond static pattern recognition (which characterizes most current biomedical AI) toward learned simulators that can model how biological and clinical states evolve over time under interventions. The authors articulate three core capabilities: (1) data engines that learn multiscale latent representations, (2) environment simulators that capture intervention-conditioned dynamics, and (3) scientific action planners that support closed-loop reasoning. Use cases span molecular dynamics, virtual cells, virtual organoids, virtual patients, and surgical simulation.
This is fundamentally a perspective/vision paper — it does not introduce a new model, algorithm, dataset, or experimental result. Its contribution is taxonomic and conceptual: organizing existing and emerging ideas under a unified framework and articulating a research agenda.
As a perspective paper, there is no experimental methodology to evaluate. The paper provides a formal notation for world models (encoder, transition dynamics, decoder) that is standard in the reinforcement learning literature and straightforwardly adapted to the biomedical setting. The taxonomy of world model types (sensory-space, latent-space, agent-coupled) is reasonable and well-organized.
However, the paper lacks several elements that would strengthen its rigor as a perspective:
The conceptual framing has moderate potential impact. The idea of applying world models to biomedicine is not entirely new — digital twins, in silico clinical trials, and virtual patient models have been discussed extensively. The paper's contribution is in systematizing these ideas under the world model framework from AI, which could help bridge communities.
However, the practical impact depends entirely on whether the technical challenges identified can be overcome. The paper acknowledges that longitudinal, intervention-rich biomedical data are "fundamentally scarce" — this is perhaps the most honest and important observation in the paper, but it also undermines much of the optimism expressed elsewhere. Without adequate training data, biomedical world models will remain aspirational.
The use cases are varied but uneven in plausibility. Surgical simulation and virtual patients are closer to realization (some systems already exist in limited forms), while virtual cells that faithfully simulate perturbation responses across molecular scales remain far more speculative.
The paper is timely in the sense that world models are a hot topic in AI (Sora, Genie, V-JEPA2, Dreamer), and there is growing interest in moving biomedical AI beyond static prediction. The convergence of foundation models, large-scale perturbation datasets (Perturb-seq), and longitudinal health records creates a moment where this vision is worth articulating.
However, the paper is not the first to propose these ideas. Digital twins in medicine, virtual cells, and simulation-based clinical decision-making have been discussed in prominent venues. The Allen Institute's virtual cell initiative, for example, predates this paper and is cited. The paper's framing through the "world model" lens is somewhat novel but risks being primarily terminological rebranding rather than conceptual advance.
This is a well-organized perspective paper that articulates an important long-term vision for biomedical AI. However, it primarily synthesizes existing ideas under a new label rather than providing deep technical insights or empirical evidence. The gap between the vision and current capabilities is vast, and the paper does not sufficiently grapple with the fundamental obstacles. Its impact will likely be as a readable introduction to this research direction rather than as a seminal contribution that redirects the field.
Generated Jun 5, 2026
Paper 2 demonstrates a concrete, working system (Qiushi Discovery Engine) that achieves end-to-end autonomous scientific discovery on a real physical platform, including the first AI-discovered and experimentally validated novel physical mechanism (optical bilinear interaction). This represents a tangible milestone with immediate implications for autonomous science and optical computing. Paper 1, while visionary, is a perspective/proposal paper outlining a paradigm (biomedical world models) without demonstrating a working system. Paper 2's empirical validation, novelty of results, and proof-of-concept for autonomous discovery give it higher near-term scientific impact.
Paper 1 proposes a high-level but potentially transformative paradigm—biomedical world models enabling intervention-conditioned simulation across scales (cells to patients)—with clear, high-stakes real-world applications in drug discovery, personalized medicine, and clinical decision support. Its breadth spans AI, systems biology, and medicine, and it is timely given rapid progress in foundation models. Paper 2 is methodologically concrete and valuable as a benchmark for interactive spatial reasoning, but its impact is more contained to embodied/multimodal agent evaluation, whereas Paper 1 targets a broader, more consequential scientific and translational frontier.
Paper 1 presents a concrete, validated methodology (GSS) with demonstrated results showing >10x efficiency gains in structure search, combining generative models with physics-based exploration in a novel unified framework. It addresses a fundamental bottleneck in materials/molecular discovery with rigorous benchmarks. Paper 2, while visionary and broad in scope, is a perspective/proposal paper outlining a paradigm ('biomedical world models') without concrete implementations or empirical validation. While Paper 2 may inspire future work, Paper 1's immediately actionable contribution with demonstrated improvements gives it higher near-term scientific impact.
Paper 1 provides a highly rigorous, large-scale empirical evaluation (25,000 runs) that critically exposes the limitations of current AI scientists. By demonstrating that these systems lack true scientific reasoning, it provides an immediate, necessary course correction for the rapidly growing AI-for-science field. While Paper 2 offers a visionary concept with broad biomedical applications, Paper 1's concrete data and timely debunking of a hyped capability will likely have a more profound and immediate methodological impact on how AI systems are developed and evaluated.
Paper 2 presents a concrete, methodologically rigorous foundation model with a novel dataset, demonstrating immediate state-of-the-art empirical results and real-world applications in biomolecular design. While Paper 1 offers a highly relevant conceptual framework for future AI paradigms, Paper 2 delivers tangible, immediate scientific advancements across multiple biological modalities with proven predictive and generative capabilities.
While Paper 1 provides a valuable conceptual framework and roadmap for biomedical world models, Paper 2 actually implements and empirically validates such a model (HealthFormer). Paper 2 demonstrates exceptional methodological rigor and immediate real-world utility by successfully transferring to independent cohorts, outperforming established clinical risk scores, and accurately simulating clinical interventions that match published randomized controlled trials. This concrete empirical breakthrough in creating clinical digital twins gives Paper 2 a much higher potential for immediate and transformative scientific impact.
Paper 1 presents a concrete, implemented system (machine collective intelligence) with demonstrated empirical results across multiple scientific domains, showing dramatic improvements (six orders of magnitude error reduction) over deep neural networks while achieving interpretability. It addresses a fundamental bottleneck in AI-driven science with validated methodology. Paper 2, while visionary and addressing an important domain, is a perspective/proposal paper outlining a paradigm ('biomedical world models') without concrete implementation or empirical validation. Paper 1's demonstrated results, methodological novelty combining symbolism and metaheuristics, and broad applicability across scientific fields give it higher near-term scientific impact.
Paper 2 likely has higher near-term scientific impact due to its demonstrated scale (5M participants, >1T minutes), extensive empirical validation across 35 tasks, and concrete methodological contributions (scaling laws, few-shot transfer, generative estimation, LLM-agent search over predictive heads, clinician-rated Personal Health Agent). It targets an immediate, high-volume real-world data stream with clear translational pathways. Paper 1 is highly novel and potentially transformative but is primarily a conceptual framework with infrastructure/governance discussion and less methodological/experimental evidence, making its impact more speculative and longer-horizon.
Paper 1 likely has higher impact because it delivers a concrete, large-scale foundation model (43.8B events; up to 1.7B parameters) with extensive empirical validation (1,000+ tasks, external datasets, prospective/retrospective tests) and demonstrated utility for clinically and regulator-relevant applications (rare disease prediction, cost forecasting, reduced bias in target trial emulation). Its methodological rigor and immediate real-world deployability in RWE pipelines make near-term cross-stakeholder impact plausible. Paper 2 is timely and potentially transformative but is primarily a conceptual framework without demonstrated results, so impact is more speculative.
Paper 1 proposes a transformative paradigm shift in biomedicine, moving AI from static pattern recognition to dynamic simulation of biological systems. Its potential real-world applications, such as virtual patients and therapeutic intervention, have profound implications for human health. While Paper 2 addresses a highly relevant problem in AI agent alignment and economics, Paper 1 demonstrates broader interdisciplinary impact, higher potential for life-saving real-world translation, and exceptional timeliness given the current need to move from foundation models to actionable, simulated biological discovery.