Pratik Jawanpuria, Bamdev Mishra
Low-rank optimal transport (OT) mitigates the quadratic scaling of classical solvers, yet existing approaches rely heavily on first-order mirror-descent updates that require careful hyperparameter tuning and ignore the optimization landscape's curvature. To address these limitations, we propose a unified Riemannian geometric framework for low-rank OT, modeling balanced and unbalanced rank- positive factored couplings as novel smooth embedded submanifolds of the positive orthant. By equipping these manifolds with the Fisher-Rao product metric, we derive tractable formulations for Riemannian projectors, retractions, and Hessian-vector products. Our cost-agnostic framework seamlessly extends to linear OT, Gromov-Wasserstein (GW), fused GW, and their unbalanced counterparts. For balanced OT, our geometric ingredients are computed via efficient conjugate-gradient and iterative Bregman updates. For the unbalanced OT, our operations elegantly reduce to closed-form scalings, completely eliminating inner iterative loops. In both regimes, per-iteration complexity scales linearly with dataset size, and we provide a rank-sufficiency certificate for global optimality verification. Extensive experiments across a range of problem sizes demonstrate that our regularization-free first- and second-order solvers achieve faster convergence and superior performance over existing state-of-the-art low-rank OT solvers.
This paper introduces a unified Riemannian optimization framework for low-rank optimal transport (OT) by modeling the set of rank-*r* positive factored couplings as smooth embedded submanifolds of the positive orthant. The key insight is equipping these manifolds with the Fisher-Rao product metric, which naturally respects the positivity constraints and information geometry of the problem. This enables derivation of tractable Riemannian projectors, retractions, and Hessian-vector products—providing, for the first time, second-order optimization tools for low-rank OT.
The framework addresses a genuine limitation of existing approaches (LOT, LR-GW, UB-LOT), which rely on alternating mirror descent with Dykstra/Bregman projections and suffer from hyperparameter sensitivity, lack of curvature exploitation, and inner iterative loops. The most striking contribution is for the unbalanced setting, where all geometric operations reduce to closed-form column scalings, completely eliminating inner iterative loops—a qualitative structural improvement.
The mathematical development is thorough and well-structured. The manifold structure proofs (Theorem 1) correctly apply the constant-rank level set theorem. The retraction validity (Theorem 2), Hessian symmetry (Theorem 3), and rank-sufficiency certificate (Theorem 4) are all rigorously proven with complete proofs in the appendix.
The Hessian-vector product derivation (Equation 14) is elegant—it avoids differentiating through the projection operator by exploiting the Gauss equation for embedded submanifolds, yielding a computationally practical formula. The rank-sufficiency certificate (Theorem 4) with its quantitative suboptimality bound (Corollary 1) is a valuable theoretical contribution, though as acknowledged in Remark 2, the strict δ_r ≥ 0 condition is rarely satisfied for generic cost matrices in the open parametrization.
The experimental evaluation is extensive, covering balanced/unbalanced linear OT, GW, and FGW across problem sizes up to n=50,000. The comparison methodology is generally fair, though there are some concerns: (1) wall-clock comparisons are on CPU only, where Sinkhorn-style methods lose their GPU parallelism advantage; (2) the iteration count advantage (order-of-magnitude fewer iterations) doesn't always translate to wall-clock advantage, particularly for balanced OT at low ranks where mirror descent's cheaper per-iteration cost compensates.
Direct applications: The framework could benefit any application using low-rank OT as a building block—domain adaptation, shape matching, generative modeling, and prototype selection. The unbalanced setting's closed-form operations are particularly impactful for applications requiring robustness to outliers or partial matching.
Methodological influence: The paper demonstrates that Riemannian optimization with carefully chosen metrics (Fisher-Rao) can structurally simplify constrained optimization problems in computational OT. This geometric perspective could inspire similar treatments for other structured transport problems (e.g., multi-marginal OT, regularized variants).
Practical utility: The elimination of hyperparameter tuning (no step-size, no entropic regularization) is a significant practical advantage. Tables 17-20 compellingly demonstrate that baselines' performance varies by 13-32% across hyperparameter settings, while Riemannian solvers are parameter-free.
The paper addresses a current bottleneck in computational OT. Low-rank OT has gained significant traction since Scetbon et al. (2021), with extensions to GW (2022) and unbalanced settings (2023). However, all existing solvers remain first-order. The community's interest in scaling OT to larger problems while maintaining solution quality makes second-order methods timely. The cost-agnostic design is well-suited to the proliferating variety of OT formulations.
1. Architectural elegance: The clean separation between geometry (manifold, metric, projectors) and cost function is a genuine design achievement. Switching between OT variants requires only changing Euclidean derivatives.
2. Unbalanced closed-form operations: The most impactful result—reducing unbalanced projectors and retractions to closed-form—yields 30-70× speedups over UB-LOT in experiments.
3. Completeness: The paper provides all first- and second-order ingredients needed for practical optimization, including the often-neglected Hessian-vector product.
4. Comprehensive experiments: 18-configuration grids, sensitivity analyses across hyperparameters, and scaling studies up to n=50,000 provide convincing evidence.
5. Rank-sufficiency certificate: A practical post-hoc diagnostic with quantitative suboptimality bounds.
1. Balanced setting overhead: For balanced OT, the Schur complement CG solve and Sinkhorn retraction add per-iteration cost that can offset the fewer-iterations advantage, particularly at low ranks. Tables 2 and 11 show LOT sometimes wins on wall-clock time despite worse iteration counts.
2. Non-convex landscapes: For GW/FGW, the framework provides local convergence guarantees only. LR-GW occasionally finds better solutions (Table 3, r=20), suggesting the many mirror-descent iterations may explore the landscape more broadly despite being first-order.
3. GPU scalability not addressed: All experiments are CPU-only. Mirror descent with Sinkhorn-style updates is highly parallelizable on GPUs; it's unclear whether the Riemannian framework's advantages persist in GPU settings.
4. Open parametrization boundary effects: The strictly positive factors cannot represent the sparse couplings that are often optimal in classical OT. The rank-sufficiency certificate acknowledges this (Remark 2) but doesn't fully resolve it.
5. Limited comparison with FRLC: FRLC comparisons are relegated mostly to appendices, and FRLC appears to outperform at some configurations (Table 8, high rank).
This is a technically strong paper that provides a principled geometric foundation for low-rank OT optimization. The unbalanced closed-form results represent a clear advance, while the balanced setting offers more nuanced improvements. The framework's modularity and theoretical completeness (manifold structure through global optimality certificates) set a new standard for geometric approaches to OT. The primary limitation is that the practical speedup advantage is most convincing in the unbalanced regime and at higher ranks.
Generated Jun 11, 2026
Paper 1 likely has higher impact: it introduces a broadly applicable geometric framework (Riemannian/Fisher-Rao) for low-rank OT with concrete algorithmic tools (projectors, retractions, Hessian-vector products), efficient per-iteration linear scaling, and extensions to OT, GW, fused GW, and unbalanced variants—areas with substantial applied demand. It also improves methodological rigor via second-order optimization and provides an optimality certificate. Paper 2 is conceptually novel for transfer under loss shift and offers clean information-theoretic identities, but its immediate application scope and downstream tooling appear narrower.
MaxProof demonstrates extraordinary results on elite mathematical competitions (IMO 2025, USAMO 2026), exceeding human gold-medal thresholds. This represents a landmark achievement in AI reasoning capabilities with massive implications for automated theorem proving, AI safety (verification), and mathematical research. While Paper 1 makes solid technical contributions to optimal transport optimization with elegant Riemannian geometry, its impact is more incremental and confined to the OT community. Paper 2's breakthrough in AI mathematical reasoning has broader cross-disciplinary impact and greater timeliness given intense interest in scaling AI reasoning.
Paper 2 likely has higher scientific impact due to broader cross-field relevance and methodological depth. Optimal transport underpins many areas (ML, vision, graphics, biology, economics), and a unified Riemannian framework that extends to OT/GW/fused GW and balanced/unbalanced settings can influence a wide range of downstream methods. It introduces principled geometry (manifold modeling, Fisher–Rao metric, projectors/retractions, Hessian-vector products) enabling efficient first/second-order solvers, linear per-iteration scaling, and a rank-sufficiency optimality certificate—features that can become foundational. Paper 1 is impactful but more application-narrow (polytope-constrained generation/planning).
Paper 1 addresses the scalability and optimization challenges of optimal transport, a fundamental mathematical tool widely used across machine learning, biology, and graphics. By providing an efficient, regularization-free Riemannian framework that generalizes to multiple OT variants, it offers foundational advancements with broader cross-disciplinary impact than Paper 2, which is more narrowly focused on policy representation in reinforcement learning and robotics.
Paper 2 presents a fundamental mathematical framework for low-rank optimal transport with broad applicability across many fields (machine learning, computer vision, computational biology, NLP). Its Riemannian geometric approach is highly novel, offering theoretical depth (manifold characterization, global optimality certificates) and practical advantages (regularization-free, linear complexity, closed-form solutions for unbalanced OT). The framework unifies multiple OT variants (balanced, unbalanced, GW, fused GW, linear OT), giving it exceptional breadth. Paper 1, while addressing an important practical problem, is more incremental and narrowly focused on missing-modality multi-omics, with less generalizable methodological contributions.
Paper 2 proposes a fundamental theoretical and algorithmic advancement in Optimal Transport, a mathematical framework widely used across diverse fields like machine learning, biology, and graphics. By offering linear scaling and unified geometric solvers, it has a broader potential impact. Paper 1 provides a valuable but more narrowly focused empirical improvement for point-cloud-based imitation learning in robotics.
Paper 2 presents a highly innovative, interdisciplinary approach bridging deep learning and physics by proposing a physical analog to transformer attention. Given the massive energy consumption of modern AI, a mathematically grounded blueprint for energy-efficient attention on physical substrates offers groundbreaking real-world potential and broad impact across both neuromorphic hardware design and machine learning. While Paper 1 provides rigorous methodological advancements in optimal transport, Paper 2 addresses a more pressing, paradigm-shifting challenge.
Paper 1 likely has higher scientific impact due to greater methodological novelty (Riemannian manifold formulation with Fisher–Rao metric, projectors/retractions/HVPs, rank-sufficiency certificate) and broader applicability across OT variants (linear OT, GW, fused GW, balanced/unbalanced). Its contributions are foundational and can influence multiple fields (optimization, geometry, ML, graphics, computational biology). Paper 2 is timely and practically relevant for clinical ECG efficiency, but its core idea (using Grad-CAM-based filtering as a training signal) is more incremental and domain-specific, with narrower cross-field impact.
Paper 2 tackles a highly timely and critical bottleneck in LLM interpretability by reviving a classical method (ICA) to replace computationally expensive Sparse Autoencoders (SAEs). Its potential for immediate, widespread adoption in the booming fields of AI safety and interpretability gives it exceptional real-world applicability and breadth of impact. While Paper 1 offers a rigorous and innovative theoretical advancement in optimal transport optimization, Paper 2's practical utility in democratizing LLM analysis positions it for broader and faster scientific influence.
Paper 2 proposes a foundational, unifying theory for machine learning interpretability using Lagrangian mechanics. By addressing a critical bottleneck in AI adoption and offering a deductive framework that spans multiple interpretability subfields, it has a vastly broader potential impact than Paper 1, which, while methodologically rigorous, focuses on improving computational efficiency for a specific mathematical problem (Optimal Transport).