Mariya Pavlova, Harrison Bo Hua Zhu, Elizsveta Semenova, Yingzhen Li
We introduce the Trajectory-based Quantization Sensitivity Score (TQS), a metric that reframes post-training quantization (PTQ) through the lens of dynamical-systems stability. By modeling the network's rollout as a discrete-time dynamical system, TQS characterizes how quantization-induced errors propagate and amplify over the rollout horizon. Unlike conventional PTQ methods, where sensitivity analysis is often coupled to the quantization procedure, TQS enables a priori sensitivity estimation decoupled from quantizer selection and bit-width assignment. This separation allows for quantization budget planning even for black-box or compiled networks with fused operators. Building on this, we present TQS-PTQ, a flexible mixed-precision framework that requires no calibration data or costly second-order approximations. Our experiments show that a dynamical-systems perspective provides a robust, high-performing pathway for low-precision deployment in resource-constrained settings.
The paper introduces TQS, a sensitivity metric that reframes post-training quantization (PTQ) for time-series models as a finite-horizon dynamical systems stability problem. The key insight is that autoregressive forecasting models are already discrete-time dynamical systems, so quantization-induced perturbations can be analyzed through the lens of Lyapunov exponents — measuring how errors propagate and amplify over rollout horizons. The main novelty is the decoupling of sensitivity estimation from quantizer design and bit-width selection. TQS computes a per-layer score via forward-pass-only perturbation rollouts, enabling *a priori* budget planning without calibration data, Hessians, or gradients. This makes it applicable to black-box or compiled models (demonstrated on Pangu-Weather's frozen ONNX export). The resulting TQS-PTQ framework uses these scores in a mixed-precision allocator (MCKP or greedy) to assign bit-widths under a target compression budget.
The methodology is well-formulated. The TQS score (Eq. 1) is a finite-time Lyapunov exponent variant that normalizes trajectory divergence by perturbation magnitude and averages over context windows. The paper thoughtfully introduces both a quantization-residual probe (γ_quant) and a Gaussian proxy (γ_gauss), then systematically studies their agreement across granularities (Table 6), finding near-perfect correlation at block level (Spearman ρ=0.96 on Pangu) but divergence at tensor level. This validates the Gaussian proxy for deployment-scale planning.
The experimental evaluation spans three architecturally distinct models — TimesFM-2.5 (200M, time-series), Aurora-small (113M, weather), and Pangu-Weather (277M, ONNX-only weather) — providing breadth. Baselines include four established methods (RTN, GPTQ, GPTAQ, QEP) at matched compression ratios. The ablation grid covers probe distribution, allocator choice, bottom tier, and FP32 budget. Bootstrap confidence intervals on Aurora (Figure 8) strengthen statistical claims.
However, several methodological aspects deserve scrutiny. The probe horizon is model-specific (T_max ∈ {4, 100, 120}) without clear justification beyond practical convergence. The dead-layer detection threshold appears somewhat ad hoc (γ ≤ -50 for TimesFM, distribution minimum for Aurora). The MCKP objective uses γ·2^{-b} as the cost function, which assumes an exponential relationship between bit-width and error — a simplification that isn't rigorously justified.
Practical deployment: The calibration-free, gradient-free nature of TQS makes it immediately applicable to operational weather forecasting settings where models ship as compiled artifacts. The demonstrated applicability to Pangu's frozen ONNX graph is a genuine practical contribution — no other evaluated baseline handles this gracefully (QEP produces NaN).
Domain-specific insight: The finding that quantization sensitivity concentrates at I/O projection modules rather than FFN layers (the LLM convention) is a valuable architectural insight for the weather/time-series ML community. This challenges the direct transfer of LLM quantization heuristics.
Efficiency: The amortization property — one sensitivity sweep supporting multiple compression targets — provides meaningful computational savings (Table 1: 32-57.5 min/Pareto point vs. hours for baselines producing fewer points).
Limitations on impact: The models tested are relatively small (113-277M parameters), whereas the most pressing quantization needs exist for billion-parameter models. The compression ratios achieved on Pangu (1.67× on-disk) are modest due to ONNX constraints. The approach is specifically designed for autoregressive time-series models and doesn't obviously extend to single-inference settings.
The paper addresses a genuine and growing need. Weather foundation models are being deployed operationally (WMO Africa pilot cited), and resource-constrained settings (developing countries' meteorological services) need efficient inference. The intersection of PTQ and scientific time-series models is underexplored — most PTQ literature focuses on LLMs. The observation that physically inconsistent states may result from quantization (violating conservation laws) motivates domain-specific quantization approaches beyond pure statistical metrics.
The Hessian vs. TQS comparison (Appendix A.12) is informative: moderate correlation with task-TQS-quant (ρ=0.47) but anti-correlation with Gaussian variants suggests TQS captures genuinely different information. The cross-dataset stability of TimesFM rankings (mean Spearman ρ=0.82) supports the claim of model-intrinsic sensitivity structure. The 64/75 variable-model win rate is strong but somewhat inflated by the heavy-compression regime where baselines struggle.
Generated Jun 12, 2026
Paper 1 introduces a fundamentally novel theoretical framework by bridging dynamical systems stability with neural network quantization. While Paper 2 offers a highly practical systems-level engineering solution for LLM agents, Paper 1 provides a mathematically rigorous, decoupled metric (TQS) that eliminates the need for calibration data. This theoretical innovation has deeper potential scientific impact, offering broad methodological advancements for efficient AI deployment across edge computing, control systems, and time-series forecasting.
Paper 1 addresses a practical and timely problem—efficient quantization of time-series models for resource-constrained deployment—with a novel dynamical-systems perspective. It offers a concrete, usable framework (TQS-PTQ) with clear applications in edge AI and model compression. Paper 2 presents an interesting theoretical contribution connecting cup products from algebraic topology/gauge theory to neural networks, but its scope is narrow, the practical applications are unclear, and the audience is limited. Paper 1's broader applicability, methodological rigor, and relevance to the growing field of efficient deep learning give it higher potential impact.
Paper 2 is more likely to have higher scientific impact: it introduces a broadly applicable, theoretically grounded metric (TQS) linking quantization to dynamical-systems stability, enabling a priori sensitivity estimation decoupled from specific PTQ choices and even usable for black-box/compiled models. This directly targets timely deployment constraints (edge/resource-limited inference) across many time-series and sequential models, with potential spillover to control and stability analysis. Paper 1’s ESE is innovative and useful for multi-system forecasting, but its impact is more domain-specific and depends on strong assumptions about equilibrium estimation.
Paper 1 addresses a critical and highly timely issue in AI alignment (RLHF), specifically the tension between helpfulness and harmlessness in LLMs. By providing mechanistic interpretability into reward models, it offers foundational insights that could broadly influence how safe and reliable AI systems are developed. While Paper 2 offers an innovative approach to model quantization, Paper 1's focus on AI safety and alignment has a broader potential impact across the rapidly expanding field of large language models and their real-world deployment.
Paper 1 addresses the highly active field of diffusion models, introducing a principled framework for better control and fairness. Generative AI control has broad applicability across multiple modalities. While Paper 2 presents a novel quantization metric using dynamical systems, it targets a more specialized domain. The broader applications, relevance to AI fairness, and significant improvements in generative modeling give Paper 1 a higher potential for widespread scientific impact.
Paper 2 is more likely to have higher impact due to broader applicability and timeliness: quantization for efficient deployment affects many time-series and sequential models across domains (edge/IoT, robotics, finance, healthcare). Its dynamical-systems framing for PTQ sensitivity is a novel, general metric that works a priori, decoupled from quantizer/bit-width choices, and can apply even to black-box/compiled networks—high practical value. The proposed mixed-precision PTQ without calibration data or second-order costs suggests strong real-world feasibility. Paper 1 is useful but narrower to molecular diffusion and post-hoc uncertainty filtering.
Paper 2 introduces a novel dynamical-systems perspective to post-training quantization, offering broad real-world applications in deploying efficient AI models on resource-constrained edge devices. Its calibration-free approach provides significant methodological advantages. While Paper 1 presents a solid improvement for neural PDE solvers, Paper 2 addresses a fundamental challenge in model compression with wider potential impact across multiple domains in deep learning and time-series analysis.
AuthorityBench addresses a critical and timely problem—how citation-based authority signals cause LLMs to hallucinate—with a large-scale, rigorously designed benchmark (220K prompts, factorial design, multiple domains). This has broad impact across AI safety, NLP, information retrieval, and policy, and is highly relevant as LLMs are increasingly deployed in citation-augmented settings. Paper 2 presents a novel quantization metric using dynamical systems theory, which is technically interesting but more niche in scope, primarily benefiting model compression practitioners. Paper 1's findings on epistemic vulnerabilities have wider interdisciplinary relevance and societal implications.
Paper 1 introduces a novel interdisciplinary framework connecting dynamical systems theory to neural network quantization, addressing a practical and timely problem in deploying models on resource-constrained devices. Its applicability to black-box and compiled networks with fused operators broadens its real-world impact significantly. The decoupling of sensitivity analysis from quantizer selection is a conceptual innovation with broad utility. Paper 2 makes solid theoretical contributions to online learning with capacity constraints, but addresses a more niche setting with primarily theoretical interest and narrower audience appeal.
Paper 2 introduces a fundamentally novel theoretical framework bridging dynamical systems and quantization, enabling data-free, decoupled sensitivity estimation. This has broad implications across the rapidly growing field of efficient model deployment and edge computing. While Paper 1 offers a strong, practical solution for tabular continual learning, Paper 2's theoretical innovation and elimination of calibration data provide a more foundational scientific contribution. Its approach could influence a wider range of architectures and hardware optimization strategies, yielding higher overall impact.