Yosuke Yamaguchi, Issei Suemitsu, Yuki Kajihara, Wenpeng Wei
Pretrained time series foundation models (TSFMs) have enabled zero-shot forecasting on unseen target series. However, existing TSFMs often incur high computational cost and provide limited support for diverse variable types, often failing to account for covariates that exogenously influence target variability. To address these challenges, we propose CITRAS-FM, a tiny 7M-parameter TSFM that supports univariate, multivariate, and covariate-informed zero-shot forecasting with real-time CPU inference. Built on a patch-based, decoder-only Transformer, CITRAS-FM introduces Shifted Attention into the cross-variate module to effectively exploit known covariates accessible throughout the forecast horizon. Moreover, to enable covariate-aware pretraining despite the scarcity of covariate-rich corpora, we propose CovSynth, which synthesizes realistic covariates from decomposed components of target series. Experiments on fev-bench, spanning 100 tasks across various settings, demonstrate that CITRAS-FM achieves state-of-the-art zero-shot accuracy among sub-10M TSFMs while delivering sub-0.1-second CPU inference, offering a strong balance between forecasting accuracy and real-time deployability.
CITRAS-FM addresses two practical limitations of existing time series foundation models (TSFMs): (1) high computational cost that precludes deployment on resource-constrained devices, and (2) limited support for covariates in zero-shot forecasting settings. The paper proposes a 7M-parameter TSFM that supports univariate, multivariate, and covariate-informed zero-shot forecasting with sub-0.1-second CPU inference.
Two specific technical contributions stand out:
The evaluation uses fev-bench, a comprehensive benchmark with 100 tasks, which is a strength compared to cherry-picked datasets. The categorization into fev-all, fev-cov, fev-multi, and fev-uni provides useful granularity. The choice of Scaled Quantile Loss and skill scores relative to SeasonalNaive is standard and appropriate.
However, several methodological concerns arise:
The practical impact potential is notable:
The impact is somewhat bounded by the fact that the model doesn't outperform larger models overall—Chronos-2, TiRex, and TimesFM-2.5 all achieve higher fev-all scores. The contribution is primarily in the efficiency-accuracy tradeoff space, which is important but more niche than a general accuracy improvement.
The paper is highly timely. TSFMs are a rapidly evolving area, and the push toward efficient, deployable models is a current priority. The covariate handling problem is genuine—most TSFMs indeed ignore covariates, which limits their practical utility. The paper appears in the context of concurrent work (Chronos-2, COSMIC, Toto) all released in 2025, positioning it well in the current discourse.
The acceptance to EUSIPCO 2026 (a signal processing conference rather than a top ML venue) somewhat limits its visibility, though the arXiv availability helps.
The paper builds directly on the authors' prior CITRAS work, making it somewhat incremental. The transition from supervised to foundation model is the main advance. The contribution would be strengthened by demonstrating CovSynth's utility when applied to other TSFMs, and by more thorough analysis of when covariate information helps versus hurts.
Generated Jun 10, 2026
Paper 1 addresses a critical bottleneck in time series foundation models by enabling high accuracy and covariate integration with minimal computational overhead (7M parameters, CPU inference). This offers immense practical utility and real-world deployability across diverse industries compared to Paper 2, which, while methodologically rigorous and impactful for scientific machine learning, targets a more specialized domain of PDE solving.
Paper 1 addresses a fundamental and timely challenge in AI safety—how to maintain oversight of increasingly capable AI systems. The bootstrapped monitoring framework introduces a novel and practically important concept: using untrusted but transparent reasoning chains to bridge capability gaps in AI oversight. This has broad implications for AI alignment and governance as models scale. Paper 2, while solid engineering work on efficient time series models, represents more incremental progress in a crowded field. The AI safety/control problem Paper 1 tackles is more urgent and cross-cutting, with higher potential to influence policy and practice.
Paper 1 presents a highly efficient, tiny (7M parameter) time series foundation model that supports real-time CPU inference and novel covariate integration. Its focus on zero-shot forecasting with low computational cost addresses a major bottleneck in deploying foundation models, giving it massive potential for broad, real-world applications across diverse industries. While Paper 2 offers a strong methodological advance in graph structure discovery using diffusion priors, Paper 1's timely contribution to efficient foundation models and immediate practical deployability suggest a broader and higher overall scientific impact.
Paper 1 has higher potential scientific impact because it addresses a fundamental challenge in deep learning: developing scalable subquadratic alternatives to Transformers. By rigorously comparing leading architectures (xLSTM, Mamba-2) across diverse, complex domains (code, time-series) and providing a unified theoretical formulation explaining xLSTM's superiority in state tracking, it offers broad foundational insights. Paper 2 is highly practical and efficient but focuses on a narrower niche (tiny time-series forecasting models), making its potential impact more domain-specific compared to the general architectural implications of Paper 1.
Paper 1 addresses a fundamental gap in PINN reliability by providing rigorous two-sided error bounds—both lower and upper—which is novel and important for trustworthy scientific computing. The theoretical contribution (computable a posteriori certificates without access to exact solutions) has broad implications for the growing PINN community and could influence how neural network-based PDE/ODE solvers are validated and certified. Paper 2, while practically useful, is more incremental—a smaller foundation model with covariate support—competing in a crowded TSFM landscape. Paper 1's methodological rigor and foundational nature give it higher long-term scientific impact.
Paper 2 (APPO) is likely to have higher scientific impact due to broader cross-field relevance and timeliness: fine-grained credit assignment and branching in agentic RL directly targets a central bottleneck in LLM-agent training and can generalize across tool use, planning, and sequence decision-making. The method introduces a principled branching score and advantage scaling, evaluated on 13 benchmarks with consistent gains, suggesting methodological rigor and adoption potential. Paper 1 is valuable and practical, but its impact is narrower (time-series forecasting) and more incremental within TSFMs.
Paper 1 introduces a fundamentally new theoretical concept (epistemic calibration) that addresses a critical gap in uncertainty quantification for machine learning. It provides formal definitions, impossibility theorems, consistent estimators, and broad experimental validation. This foundational contribution has potential to reshape how epistemic uncertainty is evaluated across many domains, especially high-stakes applications. Paper 2, while practically useful, is more incremental—proposing a smaller, efficient time series model with covariate support, representing engineering optimization rather than conceptual innovation. Paper 1's theoretical depth and broad applicability give it higher long-term scientific impact.
CITRAS-FM addresses a more specific and underexplored gap in time series foundation models—covariate-informed zero-shot forecasting with a tiny model. Its novel contributions (Shifted Attention, CovSynth for synthetic covariate generation, and comprehensive benchmarking on 100 tasks) represent meaningful methodological innovations. Paper 2, while practical, primarily combines existing techniques (LoRA, budget forcing via RL, adapter switching) for on-device reasoning without introducing fundamentally new methods. Paper 1's contributions to the rapidly growing TSFM field, especially enabling covariate-aware pretraining and achieving SOTA with only 7M parameters, have broader research implications.
Paper 1 addresses a broad and highly impactful domain (time-series forecasting) with a novel, ultra-compact foundation model capable of zero-shot covariate-informed predictions. Its ability to run in real-time on CPUs and its innovative data synthesis method (CovSynth) offer vast real-world applications across finance, logistics, and healthcare. While Paper 2 provides crucial methodological insights for the EEG/BCI community, Paper 1's contributions have wider cross-disciplinary applicability and immediate practical utility in edge deployment.
Paper 1 addresses a critical bottleneck in time series foundation models by introducing a highly efficient, CPU-deployable 7M-parameter model. Its ability to incorporate covariates and perform zero-shot forecasting offers massive real-world utility across diverse industries. While Paper 2 provides rigorous theoretical advancements in bandit learning, Paper 1's combination of efficiency, broad applicability, and timeliness in the foundation model landscape gives it a higher potential for widespread scientific and practical impact.