Lilan Peng, Yandi Liu, Qingren Yao, Chongshou Li, Tianrui Li
Spatio-Temporal forecasting is crucial in diverse fields, such as transportation, climate, and energy. Urban spatio-temporal data exhibits temporal mirage: similar short-window inputs have divergent future trends, and vice versa. Existing spatio-temporal graph neural networks (STGNNs) cannot effectively identify such mirages. We argue that the core reason lies in the short-window inputs that have incomplete period observation, heterogeneous global spatial correlation, and cross-period superposition causality. To bridge this gap, we develop a novel Multi- Period Pattern Pre-training (MP3), a plug-and-play pre-training plugin for distinguishing temporal mirages. MP3 presents two core innovations: (1) The multi-period pattern learning is designed to learn multi-period patterns from long time series. Specifically, multi-period temporal modeling leverages edge convolution to identify different multi-period patterns. Multi-period spatial modeling uses a bottleneck project and a global memory bank to capture heterogeneous global spatial relations efficiently. Cross-period pattern interaction employs a causality-enhanced Transformer to capture dependencies across different period patterns. (2) This plugin can seamlessly integrate into existing STGNN backbones to strengthen their forecasting performance. The experiment on five STGNN baselines across five real-world datasets (including a large-scale dataset CA) verify the effectiveness, superior scalability and strong adaptability of MP3, which brings consistent and robust performance improvements across all evaluated baselines. On average, MP3 reduces the MAE 4.7% and the RMSE 5.0%. The code can be available at https://github.com/YAN-outlook/MP3.
MP3 introduces a plug-and-play pre-training module that extracts multi-period pattern knowledge from long-term spatio-temporal time series and injects it into existing STGNN backbones. The paper identifies a phenomenon termed "temporal mirage" — where similar short-window inputs lead to divergent future trends — and attributes it to three causes: incomplete period observation, heterogeneous global spatial correlation, and cross-period superposition causality.
The core technical contributions include: (a) multi-period temporal modeling via edge convolution that decouples intra-period and inter-period variations from reshaped 2D time structures; (b) multi-period spatial modeling using bottleneck projection and a momentum-updated global memory bank for scalable spatial reasoning; and (c) a causality-enhanced Transformer with a DAG-based upper triangular mask to enforce unidirectional causal flow from stronger to weaker period patterns. The frozen pre-trained MP3 module is fused with short-term backbone features via a gating mechanism.
Strengths in experimental design: The paper evaluates on five datasets (PEMS03/04/07/08 and large-scale CA with 9,638 sensors) with five STGNN backbones (STGCN, GWNet, STWA, MSDR, STNorm), providing a comprehensive 5×5 evaluation grid. The ablation studies systematically remove or replace each component (edge convolution vs. 2D conv/linear, cross-period interaction variants, causal mask removal, period ordering perturbation, context embedding removal, fixed vs. dynamic periods). The parameter sensitivity analysis examines different window lengths with associated training costs.
The plug-and-play nature of MP3 is its strongest practical selling point. If the module truly generalizes across architectures and datasets with minimal hyperparameter tuning, it could become a standard component in spatio-temporal forecasting pipelines. The average improvements (4.7% MAE, 5.0% RMSE) are meaningful for traffic forecasting applications where small percentage gains translate to significant operational value.
The scalability demonstration on the CA dataset (9,638 nodes) is important for real-world deployment. The memory bank mechanism for maintaining global spatial context during batch training is a practical contribution that addresses a real engineering constraint.
However, the impact may be somewhat limited by: (1) the approach is specifically designed for periodic spatio-temporal data and may not transfer to non-periodic domains; (2) the pre-training step adds computational overhead (T=1248 input, 100 epochs), which requires justification in deployment scenarios; (3) the method is validated only on traffic flow data, despite claims about applicability to climate and energy domains.
The paper addresses a genuine need in the spatio-temporal forecasting community, where performance improvements from architectural innovations have plateaued. The pre-training paradigm for spatio-temporal data is an emerging and active area, and the focus on multi-period patterns fills a specific gap — most existing pre-training methods focus on generic spatio-temporal knowledge distillation rather than explicit periodicity modeling. The combination of FFT-based period discovery with structured temporal reshaping builds on TimesNet's approach but adapts it meaningfully for the spatial-temporal graph setting.
The writing quality is generally good but could be more concise. The "temporal mirage" terminology, while memorable, risks overstating the novelty — the fundamental issue (short context windows missing long-range dependencies) is well-known. The contribution is more in the specific solution architecture than in problem identification.
The code availability is a positive factor for reproducibility. The method's reliance on FFT for period discovery means it may struggle with irregularly sampled data or datasets with weak periodicity.
Generated Jun 12, 2026
Paper 2 is likely to have higher scientific impact: it targets broadly important spatio-temporal forecasting domains (transportation, climate, energy), proposes a general plug-and-play pretraining framework that integrates with multiple STGNN backbones, and demonstrates consistent gains across five baselines and five real-world datasets, suggesting robustness and wide adoptability. Paper 1 is technically novel but impacts a narrower slice of diffusion inference and reports modest speedups (up to 6.3%), making downstream real-world influence potentially more limited despite strong methodological contributions.
Paper 1 (MP3) addresses a broadly applicable problem in spatio-temporal forecasting with a plug-and-play framework validated across 5 backbones and 5 datasets including large-scale ones, demonstrating strong generalizability. The concept of 'temporal mirage' is novel and the multi-period pre-training approach has wide applicability across transportation, climate, and energy domains. Paper 2 (FedSAP) makes a solid but more incremental contribution to federated prototype learning with narrower scope (visual classification under non-IID). MP3's broader domain impact, stronger empirical validation, and architectural flexibility as a plugin give it higher potential impact.
Paper 1 introduces a fundamental, principled mathematical framework for controlling diffusion models, a highly prominent area in modern AI. By replacing heuristic methods with Jeffrey's rule, it offers broad foundational advancements for generative modeling, including fairness and quality improvements. In contrast, Paper 2 presents a domain-specific architectural plugin for spatio-temporal forecasting; while useful, its contribution is more incremental compared to the theoretical and widespread potential impact of Paper 1.
Paper 2 (MP3) likely has higher scientific impact due to broader real-world applicability (transportation, climate, energy), timeliness of spatio-temporal foundation/pre-training methods, and demonstrated plug-and-play gains across multiple backbones and datasets, suggesting strong adoption potential. Its pre-training plugin concept can generalize across forecasting tasks and STGNN architectures, increasing cross-field impact. Paper 1 is novel and practical for uncertainty estimation, but its scope is narrower (primarily classification/ensemble uncertainty) and impact depends on uptake relative to many existing uncertainty frameworks.
Paper 1 tackles a critical bottleneck in LLM inference (KV cache reuse in RAG and agentic workflows) with a highly elegant, low-code solution in a dominant framework (vLLM). Given the massive scale of current LLM deployments, its significant performance gains (up to 100x faster time-to-first-token for cached spans) offer immediate, wide-reaching practical and system-level impact. Paper 2 presents a solid, though more incremental, improvement (~5% gain) in spatio-temporal forecasting.
Paper 1 addresses a fundamental and timely problem in AI alignment—understanding internal conflicts between helpfulness and harmlessness objectives in reward models used for RLHF. This mechanistic interpretability work has broad implications for the safety and trustworthiness of large language models, a topic of enormous current interest. The findings about shared neurons and interference between objectives provide novel insights that could influence how alignment is approached. Paper 2 offers a solid engineering contribution to spatio-temporal forecasting with modest incremental improvements (~5%), but operates in a more narrow domain with less transformative potential.
Paper 2 (MP3) has higher estimated scientific impact due to several factors: (1) it addresses a well-defined and broadly applicable problem (spatio-temporal forecasting) across multiple domains (transportation, climate, energy); (2) it introduces the novel concept of 'temporal mirage' which provides conceptual clarity; (3) it is designed as a plug-and-play module compatible with multiple existing architectures, increasing adoption potential; (4) it demonstrates consistent improvements across 5 baselines and 5 datasets with clear quantitative gains; (5) code availability enhances reproducibility. Paper 1, while interesting, proposes a more incremental sampling-time modification with limited evaluation scope and less rigorous experimental validation.
Paper 2 has higher potential impact due to a clearer, broadly applicable contribution: a plug-and-play pre-training module that improves multiple STGNN backbones across diverse real-world datasets, addressing a well-motivated phenomenon (“temporal mirage”). Its design (multi-period temporal/spatial modeling + cross-period interaction) and consistent gains suggest stronger methodological rigor and transferability. It targets high-demand applications (transportation, climate, energy) and aligns with the current trend toward pretraining for time-series/graph forecasting. Paper 1 is solid but more niche and shows mixed competitiveness on real data (e.g., balanced clusters).
Paper 1 has higher likely scientific impact: it introduces a concrete, novel pre-training module (MP3) addressing an identified failure mode (“temporal mirage”) in spatio-temporal forecasting, with clear methodological contributions, plug-and-play compatibility across multiple STGNN backbones, and validated gains on multiple real-world datasets plus released code—supporting adoption and reproducibility. Its applications span transportation, climate, and energy forecasting, giving immediate real-world relevance. Paper 2 is a valuable conceptual framework but lacks a new algorithm and strong empirical validation, making near-term uptake and measurable impact less certain.
Paper 1 presents a novel, well-validated framework (MP3) for spatio-temporal forecasting with clear methodological contributions (multi-period pattern learning, plug-and-play architecture), extensive experiments across five datasets and five baselines, and broad applicability in transportation, climate, and energy domains. Paper 2 addresses an interesting but narrower problem of finding multiple interpretations in datasets, with experiments limited to a single dataset (METABRIC) and a less developed methodology. Paper 1's rigor, scalability, and breadth of impact significantly exceed Paper 2's contributions.