Back to Rankings

Quantizing Time-Series Models As Dynamical Systems: Trajectory-Based Quantization Sensitivity Score

Mariya Pavlova, Harrison Bo Hua Zhu, Elizsveta Semenova, Yingzhen Li

cs.LG
Share
#2918 of 5669 · cs.LG
Tournament Score
1398±48
10501750
63%
Win Rate
10
Wins
6
Losses
16
Matches
Rating
6.5/ 10
Significance6.5
Rigor6.5
Novelty7
Clarity6.5

Abstract

We introduce the Trajectory-based Quantization Sensitivity Score (TQS), a metric that reframes post-training quantization (PTQ) through the lens of dynamical-systems stability. By modeling the network's rollout as a discrete-time dynamical system, TQS characterizes how quantization-induced errors propagate and amplify over the rollout horizon. Unlike conventional PTQ methods, where sensitivity analysis is often coupled to the quantization procedure, TQS enables a priori sensitivity estimation decoupled from quantizer selection and bit-width assignment. This separation allows for quantization budget planning even for black-box or compiled networks with fused operators. Building on this, we present TQS-PTQ, a flexible mixed-precision framework that requires no calibration data or costly second-order approximations. Our experiments show that a dynamical-systems perspective provides a robust, high-performing pathway for low-precision deployment in resource-constrained settings.

AI Impact Assessments

(1 models)

Scientific Impact Assessment: Trajectory-Based Quantization Sensitivity Score (TQS)

1. Core Contribution

The paper introduces TQS, a sensitivity metric that reframes post-training quantization (PTQ) for time-series models as a finite-horizon dynamical systems stability problem. The key insight is that autoregressive forecasting models are already discrete-time dynamical systems, so quantization-induced perturbations can be analyzed through the lens of Lyapunov exponents — measuring how errors propagate and amplify over rollout horizons. The main novelty is the decoupling of sensitivity estimation from quantizer design and bit-width selection. TQS computes a per-layer score via forward-pass-only perturbation rollouts, enabling *a priori* budget planning without calibration data, Hessians, or gradients. This makes it applicable to black-box or compiled models (demonstrated on Pangu-Weather's frozen ONNX export). The resulting TQS-PTQ framework uses these scores in a mixed-precision allocator (MCKP or greedy) to assign bit-widths under a target compression budget.

2. Methodological Rigor

The methodology is well-formulated. The TQS score (Eq. 1) is a finite-time Lyapunov exponent variant that normalizes trajectory divergence by perturbation magnitude and averages over context windows. The paper thoughtfully introduces both a quantization-residual probe (γ_quant) and a Gaussian proxy (γ_gauss), then systematically studies their agreement across granularities (Table 6), finding near-perfect correlation at block level (Spearman ρ=0.96 on Pangu) but divergence at tensor level. This validates the Gaussian proxy for deployment-scale planning.

The experimental evaluation spans three architecturally distinct models — TimesFM-2.5 (200M, time-series), Aurora-small (113M, weather), and Pangu-Weather (277M, ONNX-only weather) — providing breadth. Baselines include four established methods (RTN, GPTQ, GPTAQ, QEP) at matched compression ratios. The ablation grid covers probe distribution, allocator choice, bottom tier, and FP32 budget. Bootstrap confidence intervals on Aurora (Figure 8) strengthen statistical claims.

However, several methodological aspects deserve scrutiny. The probe horizon is model-specific (T_max ∈ {4, 100, 120}) without clear justification beyond practical convergence. The dead-layer detection threshold appears somewhat ad hoc (γ ≤ -50 for TimesFM, distribution minimum for Aurora). The MCKP objective uses γ·2^{-b} as the cost function, which assumes an exponential relationship between bit-width and error — a simplification that isn't rigorously justified.

3. Potential Impact

Practical deployment: The calibration-free, gradient-free nature of TQS makes it immediately applicable to operational weather forecasting settings where models ship as compiled artifacts. The demonstrated applicability to Pangu's frozen ONNX graph is a genuine practical contribution — no other evaluated baseline handles this gracefully (QEP produces NaN).

Domain-specific insight: The finding that quantization sensitivity concentrates at I/O projection modules rather than FFN layers (the LLM convention) is a valuable architectural insight for the weather/time-series ML community. This challenges the direct transfer of LLM quantization heuristics.

Efficiency: The amortization property — one sensitivity sweep supporting multiple compression targets — provides meaningful computational savings (Table 1: 32-57.5 min/Pareto point vs. hours for baselines producing fewer points).

Limitations on impact: The models tested are relatively small (113-277M parameters), whereas the most pressing quantization needs exist for billion-parameter models. The compression ratios achieved on Pangu (1.67× on-disk) are modest due to ONNX constraints. The approach is specifically designed for autoregressive time-series models and doesn't obviously extend to single-inference settings.

4. Timeliness & Relevance

The paper addresses a genuine and growing need. Weather foundation models are being deployed operationally (WMO Africa pilot cited), and resource-constrained settings (developing countries' meteorological services) need efficient inference. The intersection of PTQ and scientific time-series models is underexplored — most PTQ literature focuses on LLMs. The observation that physically inconsistent states may result from quantization (violating conservation laws) motivates domain-specific quantization approaches beyond pure statistical metrics.

5. Strengths & Limitations

Key Strengths:

  • Elegant conceptual framing: treating PTQ as dynamical systems stability is natural and productive
  • Practical applicability to black-box/ONNX models — a genuine capability gap filled
  • Comprehensive cross-architecture analysis revealing the I/O sensitivity concentration principle
  • One-sweep amortization across compression targets is operationally valuable
  • Honest reporting of negative results (TQS loses at mid-compression W3 on TimesFM; QEP NaN on Pangu)
  • Notable Limitations:

  • Model scale is limited; whether TQS scales to billion-parameter models is unknown
  • The Pangu on-disk compression (1.67×) is underwhelming, constrained by the ONNX wrapper surface area
  • The dynamical systems connection, while conceptually appealing, is somewhat loose — TQS is essentially a forward-perturbation sensitivity score with Lyapunov-inspired normalization, not a rigorous stability analysis
  • No comparison with activation quantization or weight+activation mixed precision
  • The paper is a workshop submission with extensive appendices (24 pages), making the contribution-to-page ratio concerning for reproducibility assessment
  • Limited theoretical analysis — no formal guarantees on when TQS rankings will produce optimal allocations
  • Additional Observations:

    The Hessian vs. TQS comparison (Appendix A.12) is informative: moderate correlation with task-TQS-quant (ρ=0.47) but anti-correlation with Gaussian variants suggests TQS captures genuinely different information. The cross-dataset stability of TimesFM rankings (mean Spearman ρ=0.82) supports the claim of model-intrinsic sensitivity structure. The 64/75 variable-model win rate is strong but somewhat inflated by the heavy-compression regime where baselines struggle.

    Rating:6.5/ 10
    Significance 6.5Rigor 6.5Novelty 7Clarity 6.5

    Generated Jun 12, 2026

    Comparison History (16)

    Wonvs. Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents

    Paper 1 introduces a fundamentally novel theoretical framework by bridging dynamical systems stability with neural network quantization. While Paper 2 offers a highly practical systems-level engineering solution for LLM agents, Paper 1 provides a mathematically rigorous, decoupled metric (TQS) that eliminates the need for calibration data. This theoretical innovation has deeper potential scientific impact, offering broad methodological advancements for efficient AI deployment across edge computing, control systems, and time-series forecasting.

    gemini-3.1-pro-preview·Jun 12, 2026
    Wonvs. Adjusted Cup-Product Neural Layer

    Paper 1 addresses a practical and timely problem—efficient quantization of time-series models for resource-constrained deployment—with a novel dynamical-systems perspective. It offers a concrete, usable framework (TQS-PTQ) with clear applications in edge AI and model compression. Paper 2 presents an interesting theoretical contribution connecting cup products from algebraic topology/gauge theory to neural networks, but its scope is narrow, the practical applications are unclear, and the audience is limited. Paper 1's broader applicability, methodological rigor, and relevance to the growing field of efficient deep learning give it higher potential impact.

    claude-opus-4-6·Jun 12, 2026
    Wonvs. Once-for-All: Scalable Simultaneous Forecasting via Equilibrium State Estimation

    Paper 2 is more likely to have higher scientific impact: it introduces a broadly applicable, theoretically grounded metric (TQS) linking quantization to dynamical-systems stability, enabling a priori sensitivity estimation decoupled from specific PTQ choices and even usable for black-box/compiled models. This directly targets timely deployment constraints (edge/resource-limited inference) across many time-series and sequential models, with potential spillover to control and stability analysis. Paper 1’s ESE is innovative and useful for multi-system forecasting, but its impact is more domain-specific and depends on strong assumptions about equilibrium estimation.

    gpt-5.2·Jun 12, 2026
    Lostvs. Understanding helpfulness and harmless tension in reward models

    Paper 1 addresses a critical and highly timely issue in AI alignment (RLHF), specifically the tension between helpfulness and harmlessness in LLMs. By providing mechanistic interpretability into reward models, it offers foundational insights that could broadly influence how safe and reliable AI systems are developed. While Paper 2 offers an innovative approach to model quantization, Paper 1's focus on AI safety and alignment has a broader potential impact across the rapidly expanding field of large language models and their real-world deployment.

    gemini-3.1-pro-preview·Jun 12, 2026
    Lostvs. Towards More General Control of Diffusion Models Using Jeffrey Guidance

    Paper 1 addresses the highly active field of diffusion models, introducing a principled framework for better control and fairness. Generative AI control has broad applicability across multiple modalities. While Paper 2 presents a novel quantization metric using dynamical systems, it targets a more specialized domain. The broader applications, relevance to AI fairness, and significant improvements in generative modeling give Paper 1 a higher potential for widespread scientific impact.

    gemini-3.1-pro-preview·Jun 12, 2026
    Wonvs. Uncertainty Estimation for Molecular Diffusion Models

    Paper 2 is more likely to have higher impact due to broader applicability and timeliness: quantization for efficient deployment affects many time-series and sequential models across domains (edge/IoT, robotics, finance, healthcare). Its dynamical-systems framing for PTQ sensitivity is a novel, general metric that works a priori, decoupled from quantizer/bit-width choices, and can apply even to black-box/compiled networks—high practical value. The proposed mixed-precision PTQ without calibration data or second-order costs suggests strong real-world feasibility. Paper 1 is useful but narrower to molecular diffusion and post-hoc uncertainty filtering.

    gpt-5.2·Jun 12, 2026
    Wonvs. How Much Memory Do We Need? Adaptive Memory Gate for Neural Operators

    Paper 2 introduces a novel dynamical-systems perspective to post-training quantization, offering broad real-world applications in deploying efficient AI models on resource-constrained edge devices. Its calibration-free approach provides significant methodological advantages. While Paper 1 presents a solid improvement for neural PDE solvers, Paper 2 addresses a fundamental challenge in model compression with wider potential impact across multiple domains in deep learning and time-series analysis.

    gemini-3.1-pro-preview·Jun 12, 2026
    Lostvs. Authority, Truth, and Citation Bias: A Large-Scale Multi-Domain Benchmark for Studying Epistemic Susceptibility in Large Language Models

    AuthorityBench addresses a critical and timely problem—how citation-based authority signals cause LLMs to hallucinate—with a large-scale, rigorously designed benchmark (220K prompts, factorial design, multiple domains). This has broad impact across AI safety, NLP, information retrieval, and policy, and is highly relevant as LLMs are increasingly deployed in citation-augmented settings. Paper 2 presents a novel quantization metric using dynamical systems theory, which is technically interesting but more niche in scope, primarily benefiting model compression practitioners. Paper 1's findings on epistemic vulnerabilities have wider interdisciplinary relevance and societal implications.

    claude-opus-4-6·Jun 12, 2026
    Wonvs. Capacity-Constrained Online Convex Optimization with Delayed Feedback

    Paper 1 introduces a novel interdisciplinary framework connecting dynamical systems theory to neural network quantization, addressing a practical and timely problem in deploying models on resource-constrained devices. Its applicability to black-box and compiled networks with fused operators broadens its real-world impact significantly. The decoupling of sensitivity analysis from quantizer selection is a conceptual innovation with broad utility. Paper 2 makes solid theoretical contributions to online learning with capacity constraints, but addresses a more niche setting with primarily theoretical interest and narrower audience appeal.

    claude-opus-4-6·Jun 12, 2026
    Wonvs. TaskFusion: Continual Anomaly Detection for Heterogeneous Tabular Data

    Paper 2 introduces a fundamentally novel theoretical framework bridging dynamical systems and quantization, enabling data-free, decoupled sensitivity estimation. This has broad implications across the rapidly growing field of efficient model deployment and edge computing. While Paper 1 offers a strong, practical solution for tabular continual learning, Paper 2's theoretical innovation and elimination of calibration data provide a more foundational scientific contribution. Its approach could influence a wider range of architectures and hardware optimization strategies, yielding higher overall impact.

    gemini-3.1-pro-preview·Jun 12, 2026