James Flora, Mitchell Black, Weng-Keen Wong, Amir Nayyeri
Positional encodings (PEs) enhance the power of graph neural networks (GNNs), both theoretically and empirically. Two of the most popular families of PEs - spectral (e.g., Laplacian eigenspaces, effective resistance) and walk-based (polynomials of the adjacency matrix) - are theoretically equivalent in expressive power, with expressivity between the 1-WL and 3-WL tests. However, this equivalence assumes the GNN uses the "complete" version of these PEs, which requires time and space complexity. Instead, practitioners commonly use truncated variants of these encodings, such as the first eigenspaces or powers of the adjacency matrix. However, the theoretical properties of these truncated PEs are unknown. In this work, we initiate the study of these truncated PEs. Theoretically, we show that, under truncation, several families of PEs are fundamentally different in expressive power. As a corollary, we show that truncated spectral PEs are no longer stronger than the 1-WL test. We also study a family of spectral PEs, the -harmonic distances, to highlight the differences in expressive power of even closely related truncated PEs. Finally, we experimentally show that a mix of truncated PEs is preferable to any single family on real-world datasets.
This paper addresses a significant gap between theory and practice in graph neural networks: while spectral and walk-based positional encodings (PEs) are theoretically equivalent in their complete forms, practitioners universally use *truncated* versions due to computational constraints (O(n³) for complete PEs). The paper initiates the formal study of truncated PEs and reveals that the theoretical equivalence breaks down under truncation—different families of truncated PEs carry fundamentally different structural information about graphs.
The key theoretical results are: (1) Truncated eigenspace projections (k-EP-WL) with Ω(n) eigenspaces can be weaker than the 1-WL test (Theorem 4.1), which is surprising and practically important since truncated eigenvectors are among the most popular PEs; (2) Constant-size versions of one PE family can distinguish graphs that Ω(n)-size versions of another cannot (Theorems 4.1, 4.2); (3) The effective resistance (on weighted graphs) is weaker than both adjacency-WL and EP-WL, disproving a conjecture by Zhang et al. (2023); (4) The k-harmonic distances are introduced as a bridge between spectral and walk PEs, with [2n]-harmonic distances achieving full spectral expressive power (Theorem 4.6).
The theoretical results are carefully constructed with explicit graph families (e.g., P_n × K_10 vs. P_n × K_5,5 for Theorem 4.1, cycles for walk encoding results). The proofs leverage well-established spectral graph theory and the WL framework, making them verifiable. The computer-aided proof for Theorem 4.3 using SymPy for exact symbolic computation is a pragmatic and rigorous approach to handling matrix pseudoinverses. Theorem 4.8's use of Descartes' rule of signs (via Lemma D.2) to bound the number of exceptional k-values is elegant.
However, Theorem 4.3 (resistance vs. adjacency WL) is only proven for weighted graphs, with the unweighted case left as a conjecture (B.1). This is a meaningful limitation since unweighted graphs are the more common setting. The authors are transparent about this gap.
The experimental validation is adequate but modest. BREC experiments confirm the theoretical expressivity hierarchy. The ZINC experiments (Table 2) support the practical recommendation of mixing PEs, though the improvements are relatively small and the experimental scope is limited to a few datasets. The ogbg-molhiv results add some breadth but remain preliminary.
Practical guidance: The paper's main practical takeaway—"if you truncate PEs, mix different families"—is simple, actionable, and supported by both theory and experiments. This could influence PE design choices in the graph transformer community.
Theoretical framework: The paper opens a new research direction by establishing that truncated PEs form a "distinct design space" rather than merely being approximations. This reframing could stimulate further investigation into optimal PE selection and truncation strategies.
k-harmonic distances as PEs: The introduction of k-harmonic distances (beyond effective resistance) as practical PEs, with the fast approximation algorithm (O(mk poly log n) time), could see adoption. The biharmonic distance's connection to edge centrality provides intuitive motivation.
Limitations of spectral PEs: Theorem 4.1 has immediate implications for graph transformer design—it formally shows that truncated eigenvectors alone cannot guarantee superiority over MPNNs, motivating the use of adjacency information alongside spectral PEs.
This paper is well-timed. Graph transformers are an active research area, and the gap between theoretical expressivity analyses (which assume complete PEs) and practical implementations (which use truncated PEs) has been growing. The paper directly addresses this disconnect. The results are relevant to anyone designing or using graph transformers with spectral or walk-based PEs.
1. Clear problem identification: The theory-practice gap for truncated PEs is well-motivated and genuinely important.
2. Strong negative results: Showing truncated spectral PEs can be weaker than 1-WL (Theorem 4.1) is surprising and has clear design implications.
3. Comprehensive comparison: The paper systematically compares spectral, walk, resistance, and k-harmonic PEs under truncation.
4. Practical algorithm: The fast approximation for k-harmonic distances makes the theoretical contributions immediately applicable.
5. Disproving conjectures: The paper disproves two conjectures from Zhang et al. (2023), advancing understanding of resistance distance expressivity.
1. Weighted vs. unweighted gap: The resistance distance results (Theorem 4.3, Corollary 4.1) are only for weighted graphs. The unweighted case—which is arguably more important practically—remains open.
2. Limited experimental scope: Only three datasets are tested, and the experimental improvements from mixing PEs are modest on ZINC (~0.01 MAE improvement). Larger-scale experiments on diverse benchmarks would strengthen the practical claims.
3. No systematic truncation analysis: While the paper shows truncated PEs differ, it doesn't provide guidance on *how much* truncation is acceptable or how to choose k optimally.
4. Expressivity vs. learnability: The paper acknowledges but doesn't deeply explore the gap between expressivity (ability to distinguish) and actual learning performance on downstream tasks.
5. Missing comparisons: No comparison with other PE families (e.g., random walk PE, heat kernels) in the truncated setting, nor with recent methods that combine structural and positional information.
This is a solid theoretical contribution that addresses a genuine and timely gap in the GNN literature. The key insight—that truncated PEs from different families are incomparable in expressive power—is both surprising and practically relevant. The theoretical results are rigorous, though the weighted-only limitation for resistance distance results is notable. The experimental validation, while supporting the theoretical claims, is somewhat limited in scope. The paper successfully opens a new research direction and provides clear practical guidance, though follow-up work on optimal truncation strategies and broader empirical validation would strengthen the impact.
Generated Jun 12, 2026
Paper 2 has higher potential impact due to its broader relevance to graph ML theory and practice: truncated positional encodings are ubiquitous across domains (molecules, social, knowledge graphs), and clarifying their expressivity under realistic computational constraints addresses a foundational gap. Its contributions are both theoretical (separation results; limits vs 1-WL; analysis of k-harmonic distances) and practical (guidance to mix PEs), likely influencing future GNN design and benchmarks. Paper 1 is useful and timely for molecular diffusion reliability, but is more domain-specific and post-hoc, with narrower cross-field influence.
Paper 1 addresses a highly timely and broadly impactful question—understanding the mechanics of RL post-training for reasoning models, which is central to current LLM development. Its mechanistic insights (strategy selection and improvement) offer practical guidance for scaling reasoning capabilities, relevant to a massive and growing research community. Paper 2 makes solid theoretical contributions on truncated positional encodings for GNNs, but addresses a more niche topic with narrower audience. The timeliness and breadth of impact of Paper 1, given the current explosion in reasoning model research, gives it higher estimated scientific impact.
Paper 1 has higher potential impact due to its broad applicability across scientific disciplines, such as physics and neuroscience, aligning strongly with the transformative 'AI for Science' paradigm. By extracting symbolic governing equations from noisy, high-dimensional data with theoretical guarantees, it addresses a fundamental problem in scientific discovery. While Paper 2 provides valuable theoretical insights into Graph Neural Network expressivity, its contributions are largely confined to the graph machine learning community, making it less interdisciplinary and transformative than Paper 1.
Paper 2 likely has higher scientific impact due to its broad relevance to the rapidly expanding GNN literature and its timely clarification of a widely used but theoretically underexplored practice: truncating positional encodings for scalability. It contributes fundamental theory (separating expressive power under truncation; showing truncated spectral PEs are not stronger than 1-WL) plus empirical guidance (mixing PEs helps), influencing both future theory and practical model design across many domains using graphs (chemistry, biology, recommender systems). Paper 1 is strong and applied, but its impact is more specialized to robotics affordance reasoning.
Paper 1 addresses a fundamental theoretical gap in understanding truncated positional encodings for GNNs, which are widely used in practice but poorly understood theoretically. It provides novel separation results showing truncated PEs differ fundamentally in expressive power, with practical implications for PE selection. Paper 2 proposes an incremental improvement to LLM alignment (SSPO), building on GRPO with soft gating functions—a useful but more incremental contribution in a rapidly evolving and crowded space. Paper 1's theoretical contributions have broader and more lasting impact across the graph learning community.
Paper 2 addresses a fundamental theoretical gap in understanding truncated positional encodings for GNNs, which are widely used in practice but poorly understood theoretically. It provides novel theoretical results showing that truncation fundamentally changes expressive power relationships, with practical implications for GNN design. Paper 1, while addressing an important problem (LLM unlearning), proposes an incremental methodological contribution in a crowded field. Paper 2's theoretical contributions have broader and more lasting impact on the graph learning community, bridging a significant theory-practice gap that affects many downstream applications.
Paper 2 addresses a critical and ubiquitous bottleneck in modern AI: the high deployment costs of large foundation models like LLMs and ViTs. By enabling a 'train-once, deploy-everywhere' paradigm through nested low-rank decomposition, its practical applicability and breadth of impact across diverse fields significantly exceed Paper 1, which offers important but more niche theoretical advancements strictly within the domain of Graph Neural Networks.
Paper 2 likely has higher scientific impact due to its foundational theoretical contribution: it clarifies how commonly used truncated positional encodings for GNNs differ in expressive power, overturning assumptions from the “complete PE” equivalence and yielding crisp corollaries (e.g., truncated spectral PEs not exceeding 1-WL). This directly affects broad GNN practice (most deployments use truncation for scalability), informs model design across domains using graphs, and is timely given the centrality of PEs. Paper 1 is impactful for agentic RL, but is more incremental and application-specific.
Paper 1 addresses a fundamental theoretical gap in understanding truncated positional encodings for GNNs, which are widely used in practice but poorly understood theoretically. It provides rigorous expressivity results with practical implications for PE design. Paper 2's RL-based adversarial defense, while interesting, faces significant concerns: adaptive attacks specifically designed to handle gradient disruption may overcome the defense, and the adversarial robustness community has repeatedly shown that gradient-masking defenses (which this resembles) tend to provide false security. Paper 1's theoretical contributions are more lasting and foundational.
Paper 2 likely has higher scientific impact: it addresses a broadly used GNN component (positional encodings) and fills a clear theoretical gap about truncated variants that are standard in practice, yielding general expressivity results and guidance applicable across many graph domains (chemistry, biology, social networks, recommender systems). Its combination of theory plus empirical validation enhances rigor and uptake. Paper 1 is practically strong for robotics imitation learning, but its core idea (Fourier features) is an adaptation of a known technique and its impact is narrower to 3D manipulation pipelines.