Yifan Hu, Hongzhou Chen, Peiyuan Liu, Yiding Liu, Zewei Dong, Jiang-Ming Yang
Real-world time series are often highly incomplete and irregular due to sensor dormancy, transmission delays, and event-driven sampling, making reliable forecasting fundamentally challenging. Existing methods have evolved from impute-then-forecast pipelines to continuous-time models such as Neural ODEs and continuous-time graph networks. While these approaches improve the modeling of historical irregularity, they still rely on an implicit oracle assumption at inference time: the timestamps of future valid observations are presumed to be known in advance. This assumption limits practical relevance, since in many real systems the more fundamental question is not only what the future value will be, but also whether a valid observation will occur at all. In this paper, we propose Timeflies, a unified framework that reformulates forecasting as a joint problem of future observability inference and value estimation. To explicitly model the interaction between observation dynamics and state evolution, Timeflies adopts an observation stream and a value stream, coupled through three dedicated modules for reliability-aware embedding, observation-guided dependency modeling, and joint prediction. We further construct Shadow, a benchmark that combines natural missingness from public datasets with real-world industrial data, and introduce the Observation-Value Joint Entropy (OVJE) metric to comprehensively evaluate this coupled predictability. Extensive experiments show that Timeflies consistently outperforms existing methods, highlighting the importance of explicitly modeling future observability in time series forecasting with missing values. Code and dataset are available in https://github.com/ant-intl/Timeflies.
The paper identifies a genuine and underappreciated assumption in time series forecasting with missing values: existing methods (including Neural ODEs and continuous-time graph networks) assume future observation timestamps are known at inference time—what the authors term the "perfect foresight assumption." The paper reframes forecasting as a joint problem: predicting *whether* an observation will occur and *what* its value will be. This is operationalized through Timeflies, a dual-stream architecture with an observation stream and value stream, coupled through reliability-aware embedding, observation-guided attention, and joint prediction heads. The paper also contributes the Shadow benchmark (31 datasets with natural missingness) and the OVJE metric for joint evaluation.
The conceptual framing is compelling. In many real systems (IoT, e-commerce, healthcare), the question of whether data will be available is genuinely as important as what the data will show. The philosophical title "Existence Precedes Value" effectively communicates this insight.
Architecture design: The three-module design (reliability-aware patch embedding, observation-guided attention, dual prediction head) is well-motivated. The reliability score ρ that attenuates sparse patches is a sensible inductive bias. The observation-conditioned attention mechanism (Eq. 7) elegantly injects missingness priors into the value attention without replacing it.
Benchmark construction: Shadow is a meaningful contribution—combining 15 public datasets from GIFT-Eval with 16 proprietary e-commerce datasets with natural (not artificially masked) missingness. The stratification into four sparsity regimes and three forecasting horizons creates 150 evaluation settings, which is thorough.
Evaluation concerns:
Ablation studies are reasonably comprehensive, though some components show mixed results (e.g., removing residual fusion actually improves MSE in high-missing regime: 1.491 vs 1.546).
Practical relevance: The problem formulation addresses a genuine deployment gap. In production systems, knowing when to expect data is valuable for resource allocation, alerting, and decision-making. The e-commerce datasets from Ant International demonstrate real industrial applicability.
Paradigm shift potential: The reframing from "predict values at all timesteps" to "predict both existence and values" could influence how the community approaches irregular time series. However, the practical scenarios where this matters most (high missingness, event-driven sampling) are somewhat niche compared to the broader time series forecasting community.
Benchmark and metric adoption: Shadow could see adoption if the proprietary datasets are actually released (the paper claims code and data availability). The OVJE metric introduces a useful concept but may need refinement before widespread adoption.
The paper addresses a timely concern as IoT, edge computing, and event-driven architectures proliferate, creating increasingly irregular data streams. The connection to the GIFT-Eval benchmark ecosystem is strategic. However, the paper arrives in a crowded space of time series forecasting papers, and the specific problem of jointly predicting observability and values may appeal more to applied ML practitioners than to the broader forecasting research community.
Reproducibility: Code and dataset release is promised. The architecture is clearly described with sufficient detail for reimplementation.
This paper makes a meaningful conceptual and practical contribution by formalizing the joint prediction of observational existence and values in time series forecasting. The benchmark and problem formulation are the strongest contributions. The technical execution is solid but not groundbreaking—the dual-stream architecture with attention modulation follows established patterns. The experimental evaluation, while extensive in scope, is weakened by the absence of the most relevant baselines (irregular time series methods). The work should influence practitioners working with naturally incomplete time series but may have limited impact on the broader forecasting research community.
Generated Jun 12, 2026
Paper 2 addresses a broadly relevant, under-modeled real-world constraint in time series forecasting: future observation existence is unknown. Jointly modeling observability and values (plus a new benchmark and metric) increases novelty, practical applicability, and likelihood of adoption across domains with irregular sensing (IoT, healthcare, finance, industrial monitoring). Providing code/dataset also boosts impact and reproducibility. Paper 1’s loss reformulation is interesting and potentially useful, but appears closer to incremental loss-function engineering with narrower cross-field reach and less clear standardization potential than a full framework + benchmark for a common deployment gap.
AuthorityBench addresses a critical and timely problem—how citation-based authority signals influence LLM hallucination—with a rigorous large-scale factorial benchmark design. As LLMs are increasingly deployed in citation-augmented settings (RAG, research assistants), understanding epistemic susceptibility has broad implications for AI safety, trust, and deployment across multiple high-stakes domains. The finding that even fabricated citations increase hallucination rates is highly actionable. Paper 2, while technically solid, addresses a more niche problem in time series forecasting with incremental methodological contributions. Paper 1's breadth of impact across AI safety, NLP, and responsible AI gives it higher potential influence.
ADWM addresses a critical and timely problem—offline evaluation of LLM agents—which is highly relevant given the explosive growth of LLM agent deployment. It introduces a novel combination of autoregressive and diffusion modeling for world models in OPE, bridging two major research areas (diffusion models and LLM agents). The breadth of impact is larger as it connects reinforcement learning, generative modeling, and LLM research communities. Paper 2, while solid and practical for irregular time series, addresses a more incremental advancement in a narrower domain with less transformative potential.
Paper 1 addresses a fundamental and overlooked assumption in time series forecasting—that future observation timestamps are known—which affects a broad range of real-world applications (IoT, industrial systems, healthcare). It introduces a novel problem formulation (joint observability and value prediction), a new framework (Timeflies), a benchmark (Shadow), and a new metric (OVJE), representing a more comprehensive scientific contribution. Paper 2, while solid in robotics affordance reasoning, targets a narrower domain. Paper 1's broader applicability across fields and its identification of a pervasive blind spot in existing methodology gives it higher potential impact.
Paper 2 addresses the highly timely and rapidly growing field of reasoning LLMs and RL post-training, providing mechanistic understanding of a poorly understood but widely adopted practice. Its insights into strategy selection and strategy improvement have broad implications for scaling reasoning capabilities across the AI community. Paper 1 makes a solid contribution to time series forecasting with missing data, but addresses a more niche problem. Paper 2's findings are more likely to influence a larger research community and have broader methodological impact given the current explosion of interest in reasoning models.
Paper 2 likely has higher impact: it targets Mixture-of-Experts routing, a central bottleneck in scaling foundation models, making it timely and broadly relevant across NLP/vision systems. The proposed Manifold Power Iteration provides a clear design principle (align routers with experts’ principal singular directions) with theoretical convergence arguments and large-scale (1B–11B) pretraining evidence, suggesting methodological rigor and immediate applicability. Paper 1 is novel and practical for irregular time series, but its impact is more domain-specific and depends on adoption of a new benchmark/metric.
Paper 2 likely has higher scientific impact due to broader applicability and timeliness: forecasting under irregular/missing observations is ubiquitous across industrial monitoring, healthcare, IoT, and finance. Modeling future observability (existence) jointly with values addresses a practical inference-time gap in many current methods, and it contributes a public benchmark (Shadow) and an evaluation metric (OVJE), which can catalyze follow-on work and standardize comparisons. Paper 1 is methodologically rigorous and novel within causal HLTE estimation under low overlap, but its impact is narrower to specialized causal inference settings.
Paper 2 introduces a fundamentally new problem formulation for time series forecasting—jointly modeling whether observations will occur and what their values will be—which challenges a widespread implicit assumption across the field. It proposes a new framework (Timeflies), benchmark (Shadow), and evaluation metric (OVJE), contributing at multiple levels. This reframing has broad applicability across domains (IoT, healthcare, industrial systems). Paper 1, while technically solid, addresses a narrower optimization problem (quantization-aware decoding for reasoning LLMs) with incremental accuracy gains (~2 points) and hardware-specific kernel improvements, limiting its broader scientific impact.
Paper 1 addresses a fundamental flaw in irregular time series forecasting (the oracle assumption of future timestamps) and proposes a comprehensive framework, benchmark, and novel metric. Its contribution is highly novel and broadly applicable across diverse fields like healthcare, finance, and IoT. Paper 2 offers a valuable but domain-specific contribution for molecular diffusion models. The foundational nature and broader methodological impact of Paper 1 give it higher potential scientific impact.
Paper 2 addresses a fundamental problem in scientific discovery by extracting governing equations from noisy, high-dimensional data. Its theoretical guarantees and symbolic recovery capabilities offer broad, interdisciplinary impact across fields like physics and biology. While Paper 1 provides a highly practical solution for industrial time series forecasting, Paper 2's contribution to system identification and representation learning has broader and more profound implications for scientific advancement.