Toshiaki Koike-Akino, Jing Liu, Ye Wang
Tensor networks provide efficient representations for compressing large neural networks. By carefully designing shapes and topologies, they can significantly reduce memory and computational costs. However, identifying implicit low-rank structures in large foundation models remains challenging due to their enormous scale and un-structured weight distributions. We propose an adaptive tensorization method that discovers inherent low-rank structure in a target tensor by index ordering. Experiments on weight and KV-cache compression demonstrate improved reconstruction quality compared to baselines.
EinSort introduces the idea that index reordering (specifically sorting) prior to tensor decomposition can dramatically expose latent low-rank structure in LLM weight matrices and KV caches. The key insight is simple yet compelling: sorting tensor entries rearranges values so that the resulting matrix becomes approximately low-rank, enabling far more efficient tensor network decomposition. The framework augments Einstein summation with reversible permutation operators, combining sorting with non-negative tensorization, gauge fixing, and flexible nonlinear mappings/reductions to manage the memory overhead of storing permutations.
The paper frames this as an overlooked degree of freedom in tensor network design — while prior work focused on topology, contraction order, and rank selection, index ordering was largely ignored. The connection to quantum CNOT/entanglement gates provides an elegant theoretical lens.
Theoretical foundations: Lemma 2.1 provides a clean result showing that sorted uniform random variables reshaped into a matrix are asymptotically rank-3. The extension to row-wise sorting (Lemma D.1, asymptotically rank-1) and the analysis of reduction operations (Proposition D.3) are well-constructed. Proposition D.2 importantly clarifies that *independent* row/column permutations cannot change singular values — the permutation must be axis-dependent (entangling), which is a crucial insight distinguishing EinSort from naïve permutation approaches like TQCompress.
Weaknesses in rigor: The theoretical results apply to i.i.d. random variables, while LLM weights are highly structured and non-i.i.d. The bridge between theory (uniform/Gaussian distributions) and practice (pretrained weights) relies on heuristic arguments about random projections inducing approximate uniformity (Appendix D.2), which is acknowledged as non-rigorous. The empirical demonstration that sorting reduces effective rank (Fig. 2) is convincing but the gap between theoretical guarantees and practical behavior remains.
Experimental evaluation: The experiments cover multiple models (Qwen3, Gemma3, Phi-3/4) and tasks (WT2 perplexity, GSM8K math reasoning, TextVQA, LIBERO robot manipulation). The LIBERO results (Tables 5-7) are particularly striking — EinSort maintains near-original performance while baselines collapse. However, the main paper experiments are relatively limited, with most detail relegated to appendices. The runtime analysis (Table 3) honestly reports 40-70% throughput degradation, which is a significant practical concern.
Direct applications: KV-cache compression for long-context LLM inference is a genuine bottleneck. Weight compression for edge deployment of VLAs is highly relevant. The VLA robot manipulation results showing near-lossless compression at 10-80% rates are compelling for real-world robotics.
Broader influence: The conceptual contribution — that index ordering is a first-class design variable in tensor networks — could influence how the community approaches tensor decomposition more broadly. The connection to quantum entanglement gates opens potential cross-pollination with quantum computing/tensor network communities.
Limitations on impact: The permutation memory overhead is a fundamental challenge. Even with sliced sorting, storing permutations requires 0.35-3.14 bits per parameter, which partially offsets compression gains. The paper acknowledges the lack of optimized CUDA kernels, and the runtime penalties (Table 3) suggest significant engineering work is needed for practical deployment. The hyperparameter search space (folding, slicing modes, power exponents, reduction operations, nonlinear mappings) is enormous, making it unclear how to apply EinSort to new settings without extensive tuning.
The paper addresses a genuine need: as LLMs scale, memory-efficient inference becomes critical. KV-cache compression and weight decomposition are active research areas. The timing is good, as the community is actively exploring alternatives to quantization-only approaches. The VLA compression results are particularly timely given the rapid growth of embodied AI.
Additional observations: The paper is somewhat sprawling — the 38-page appendix contains significant content that arguably belongs in the main text (e.g., the VLA results are among the strongest but appear only in appendices). The main paper's 4-page experimental section feels compressed relative to the framework's complexity. The non-negative tensorization trick (keeping 1-bit sign separately) is a nice practical contribution that deserves more prominence.
Generated Jun 9, 2026
Paper 1 makes a methodologically rigorous contribution to mechanistic interpretability by formally distinguishing co-activation-based circuit proposals from causally validated circuits. It introduces the closure test framework, evaluates across multiple architectures (dense and MoE), and provides nuanced findings (e.g., co-activation signals not surviving causal ablation in MoE models). This addresses a fundamental question in the rapidly growing interpretability field. Paper 2 presents a useful but more incremental compression technique (index reordering for tensorization), which, while practical, has narrower conceptual impact and less novelty compared to Paper 1's systematic framework for validating circuit discovery methods.
EinSort addresses the highly active and impactful area of LLM compression through a novel adaptive tensorization method based on index sorting. Given the enormous interest in making LLMs more efficient, this work has broad applicability across the deep learning community. Paper 2 provides a useful but narrower contribution—a practical mapping between privacy parameters—that serves as a convenience for practitioners in differential privacy but represents an incremental refinement rather than a fundamentally new capability. Paper 1's broader relevance to the LLM efficiency problem gives it higher potential impact.
Paper 2 likely has higher impact due to broad, immediate applicability: a reproducible, declarative framework for checkpoint manipulation addresses a common pain point across many subfields (LLM editing, upcycling, compression, debugging). Its focus on reliability (assertions, format abstraction, memory management) improves methodological rigor and enables more reproducible research and tooling ecosystems. Paper 1 is novel algorithmically (index-ordering-based adaptive tensorization) with clear compression benefits, but is narrower in scope and may depend on specific tensor-network assumptions and benchmark coverage.
Paper 2 addresses a critical bottleneck in mechanistic interpretability—predicting and mitigating side effects in feature steering. Its comprehensive evaluation across multiple state-of-the-art architectures and SAE variants provides foundational insights crucial for AI alignment and safety. While Paper 1 offers a valuable practical compression technique, Paper 2 provides a novel theoretical framework with profound implications for understanding and safely controlling large language models.
Paper 1 has higher potential impact due to timeliness and real-world applicability: adaptive tensorization for weight/KV-cache compression directly targets deployment bottlenecks of LLMs (memory, latency, cost) and could be broadly adopted. Its novelty—discovering low-rank structure via index ordering—offers a practical compression lever for foundation models. Paper 2 is conceptually interesting (fractal/Fourier generalization metric and optimizer) but risks narrower applicability and weaker external validity since results are on small vision benchmarks; generalization measures often struggle to transfer to modern large-scale settings.
EinSort addresses a highly timely and impactful problem—compressing large language models via tensor network decomposition with an adaptive tensorization method. LLM compression is a critical bottleneck affecting deployment, efficiency, and accessibility of foundation models, giving it broad applicability across AI/ML. Paper 2 is a replication and methodological critique of a specific airline clustering study, with narrower scope and limited generalizability. While methodologically sound, its contributions are incremental and domain-specific, whereas Paper 1 introduces a novel method with potential wide adoption in the rapidly growing LLM ecosystem.
Paper 1 demonstrates significantly higher scientific impact potential. It addresses a critical clinical problem—early prediction of treatment complications in cancer—using a large-scale dataset of nearly 3 million measurements, validates across multiple independent datasets (MIMIC-IV, MMRF CoMMpass), and shows practical utility without requiring new infrastructure. The breadth of 162 complications across 8 clinical categories, combined with interpretable biomarker masking analysis, offers both methodological rigor and direct clinical applicability. Paper 2, while technically interesting, addresses a narrower optimization problem (tensor compression for LLMs) with incremental improvements over baselines and limited demonstrated real-world impact.
Paper 2 introduces a broadly applicable spectral audit framework that reveals a previously underappreciated confound (aperiodic 1/f components) affecting deep learning across multiple physiological signal types (EEG, ECG), architectures, and tasks. Its implications are far-reaching: it could change standard practices for interpretable physiological AI, affecting clinical deployment and research methodology across neuroscience, cardiology, and ML. Paper 1 addresses tensor compression for LLMs—useful but incremental in a crowded field. Paper 2's cross-domain relevance, methodological rigor, and potential to reshape evaluation standards give it higher impact.
Paper 1 addresses a more impactful problem—LLM-based time series forecasting—with a novel causal framework (CVAformer) that disentangles invariant and dynamic components using causal intervention. It demonstrates broad applicability across multiple forecasting settings (long-term, short-term, few-shot, zero-shot) with extensive experiments. Paper 2 proposes a useful but narrower contribution on tensor network compression via index sorting. While relevant, it addresses a more incremental optimization problem with less breadth of impact and fewer demonstrated applications.
Paper 2 offers a novel, rigorous theoretical framework bridging control theory and continual learning. By providing stability and convergence guarantees for the stability-plasticity trade-off, it addresses a fundamental and persistent AI challenge (catastrophic forgetting). While Paper 1 is highly timely for LLM compression, Paper 2's foundational contributions are likely to yield a broader and more lasting scientific impact across various machine learning domains.