Ziwen Kan, Yishuo Chen, Kecheng Li, Andrew Wen, Xiaomeng Wang, Liwei Wang, Jihao Duan, Song Wang
Time series foundation models (TS-FMs) aim to learn generalizable temporal representations that can be adapted to a wide range of downstream tasks. In real-world multimodal settings, time series are frequently affected by temporal misalignment and partial modality missingness, where different modalities are observed at heterogeneous time scales or are partially absent. Existing approaches typically rely on naive imputation or masking strategies, which fail to account for cross-modal dependencies and often lead to misaligned or degraded representations. We propose TRACE, a conditional estimation paradigm for multimodal time series foundation model pipelines under missingness and irregular sampling, allowing incomplete target modalities to be systematically inferred from available auxiliary modalities. We evaluate TRACE on diverse multimodal benchmarks spanning healthcare and affective computing, including the MIMIC-IV clinical dataset and the CMU-MOSI and CMU-MOSEI benchmarks for multimodal sentiment analysis. Across a range of downstream prediction tasks and missing-modality settings, TRACE consistently outperforms prior multimodal fusion approaches, demonstrating improved robustness to severe modality missingness and more reliable cross-modal representations.
TRACE introduces a conditional estimation paradigm for multimodal time series foundation models (TS-FMs) that addresses temporal misalignment and partial modality missingness. The key insight is reframing missing modality data not as values to be deterministically filled (e.g., nearest-neighbor or zero imputation), but as latent temporal variables to be probabilistically estimated via conditional diffusion, leveraging cross-modal dependencies from available auxiliary modalities.
The architecture operates in two stages: (1) a multimodal conditional diffusion module that estimates missing components of each target modality conditioned on its observed entries and an MoE-gated cross-modal context, and (2) an MoE fusion layer (inherited from FuseMoE) for downstream prediction. The conditional diffusion stage uses DDPM with cross-modal conditioning signals constructed via a learnable gating mechanism over auxiliary modalities.
Real-world applications: The healthcare setting is compelling. Clinical data routinely exhibits 30%+ missingness, and the MIMIC-IV experiments demonstrate practical relevance. If TRACE can reliably improve predictions under realistic clinical missingness patterns, this has direct implications for clinical decision support systems.
Methodological influence: The paradigm shift from "impute then fuse" to "conditionally estimate then fuse" is conceptually clean and could influence how future multimodal TS-FMs handle missingness. The framework is modular — the conditional diffusion stage can potentially be swapped with other probabilistic estimation methods.
The paper addresses a genuine and timely bottleneck. As TS-FMs proliferate, the gap between the clean-data assumptions of most foundation models and the messy reality of multimodal time series data becomes increasingly critical. The healthcare domain in particular has long struggled with missing data, and integrating foundation model capabilities with principled missingness handling is a relevant research direction.
The positioning relative to FuseMoE (NeurIPS 2024) is well-targeted — TRACE directly addresses FuseMoE's acknowledged limitation of naive imputation under severe sparsity.
The paper is well-written with clear exposition. The motivational Figure 1 effectively communicates the core advantage. However, the claim of being a "paradigm" may be slightly overstated given the specific instantiation choices. The modular design is appealing for future extensions, but the current evaluation is limited to specific diffusion-based instantiation without exploring alternative probabilistic estimation methods that could validate the paradigm-level claim.
The synthetic dataset construction (Appendix A.3) is thorough and could serve as a useful benchmark for future work on multimodal imputation under controlled conditions.
Generated Jun 5, 2026
TRACE addresses a fundamental and broadly applicable challenge in multimodal time series foundation models—temporal misalignment and modality missingness—which affects numerous real-world domains including healthcare and affective computing. Its contribution to foundation model pipelines for multimodal data has broader impact potential across multiple fields. While HERO presents a clever solution for multi-turn agent self-distillation, its scope is narrower, focusing on improving RL-based agents in specific benchmarks. TRACE's methodological contribution to handling missing modalities in foundation models addresses a more pervasive problem with wider applicability.
Paper 2 (ABC-Bench) has higher potential impact due to its novelty and timeliness in benchmarking agentic LLM bio-capabilities with direct biosecurity relevance, a high-stakes, cross-disciplinary area (AI, biology, security, policy). It includes methodological rigor via expert baselines and wet-lab validation demonstrating real-world agent performance. Its applications span evaluation, governance, and risk mitigation, likely influencing standards and regulation. Paper 1 is a solid methodological contribution to multimodal time-series robustness with clear applied value, but its breadth and urgency are narrower than biosecurity benchmarking.
Paper 1 presents a novel theoretical framework combining online learning, Pandora's Box theory, and LLM cascading with rigorous regret bounds. It introduces a fundamentally new problem formulation (output-mediated feedback in contextual Pandora's Box) with strong methodological contributions (GMM-based reservation index estimation, UCB confidence bounds) and provable guarantees. Paper 2, while addressing a practical problem of multimodal missingness, offers a more incremental contribution—conditional estimation for handling missing modalities—building on existing foundation model pipelines. Paper 1's theoretical novelty and its direct relevance to the rapidly growing LLM deployment ecosystem give it broader and deeper potential impact.
Paper 2 (Agents' Last Exam) has higher potential impact due to its broad, timely relevance to evaluating economically meaningful agent performance, a major bottleneck in applied AI. Its large-scale, expert-curated, verifiable, long-horizon benchmark could shape research directions across LLM agents, evaluation, alignment, and human-computer interaction, and drive real-world deployment standards. Paper 1 is a solid methodological contribution to multimodal time-series robustness, but its impact is narrower to specific modalities/tasks and likely incremental relative to the sweeping cross-domain influence a widely adopted benchmark can have.
Paper 2 presents a novel, successful methodology to solve a pervasive problem in multimodal time series (temporal misalignment and missing data), with direct real-world applications in high-impact domains like healthcare. In contrast, Paper 1 presents a scoped negative result on cross-model activation transfer in a specific LLM setup. While valuable for mechanistic understanding, Paper 2's positive contribution to foundation models offers broader applicability, methodological innovation, and immediate utility across diverse fields.
TRACE addresses a fundamental and practical challenge in multimodal time series foundation models—temporal misalignment and modality missingness—which is pervasive across healthcare, affective computing, and many other domains. Its methodological contribution (conditional estimation for cross-modal inference) is broadly applicable and integrates with the rapidly growing foundation model ecosystem. Paper 2 (SAGE) provides interesting insights about social vs. self-improvement in LLM agents, but its findings are more incremental (social learning helps weaker agents but not the strongest) and the evaluation framework, while novel, addresses a narrower and more transient research question. TRACE's real-world applicability and alignment with the foundation model trend give it higher impact potential.
TRACE addresses a fundamental and widespread challenge in multimodal time series foundation models—temporal misalignment and modality missingness—with a principled conditional estimation framework validated across healthcare and affective computing. This has broad real-world applicability (clinical data, sensor fusion) and contributes to the rapidly growing foundation model paradigm. Paper 2 (ToolMaze) provides a useful benchmark for LLM agent robustness to tool failures, offering valuable empirical insights, but benchmarks tend to have more transient impact than methodological contributions. TRACE's cross-domain applicability and methodological novelty give it higher long-term scientific impact.
Paper 2 addresses a highly timely and socially significant topic—the environmental impact of hyperscale data centers driven by AI growth. It provides the first comprehensive facility-level assessment of 403 US hyperscale data centers, offering concrete empirical data (68-99 TWh consumption, carbon intensity 48% above grid average) that will be widely cited in policy, sustainability, and computing research. Its broad interdisciplinary relevance spanning energy policy, environmental science, and computer science, combined with immediate real-world policy implications, gives it higher potential impact than Paper 1's more incremental technical contribution to multimodal time series modeling.
Continual learning in LLMs is a critical bottleneck for developing autonomous AI agents. By providing the first expert-validated benchmark across diverse domains, Paper 1 establishes a foundational evaluation metric that will likely drive broad research in AI memory and learning systems. While Paper 2 offers a valuable methodological improvement for multimodal time-series foundation models, Paper 1 addresses a more universally recognized challenge in the rapidly expanding and highly impactful field of frontier AI systems, giving it a higher potential for broad scientific impact.
TRACE addresses a fundamental challenge in multimodal time series foundation models—temporal misalignment and modality missingness—which is pervasive across healthcare, affective computing, and many other domains. Its contribution to the rapidly growing field of foundation models for time series, combined with its broad applicability across modalities and domains, gives it wider potential impact. FIDES, while technically strong and addressing an important RAG faithfulness problem, targets a more specific issue (retrieval-memory conflict in LLM decoding) with a training-free inference-time fix that may be superseded as models improve. TRACE's paradigm for conditional estimation under missingness has more foundational, cross-disciplinary relevance.