Mickaël Basson, Philippe Preux
Neural combinatorial optimization has recently achieved strong results on the Euclidean Traveling Salesman Problem (TSP) using generative models such as diffusion and consistency models. State-ofthe-art approaches like FT2T combine fast consistency-based prediction with gradient-based inference time refinement. However, gradient search often incurs significant computational overhead and may not align with the discrete structure of feasible solutions. We introduce Projected Consistency Inference (PCI), a plug-and-play, retraining-free alternative that replaces gradient refinement with structure-aware projections: PCI decodes valid Hamiltonian tours from the consistency model output and applies a lightweight local search (e.g., 2-opt). PCI achieves an average optimality gap (OG) of 0.17% on TSP with 500 cities, and 0.31% on TSP with 1000 cities, outperforming FT2T best settings (OG 0.22% and 0.36%, respectively) while reducing the inference time up to 30 to 40%. PCI also exhibits lower variance and memory usage, and can surpass classical heuristics such as LKH3 in rapid solution generation. Our results demonstrate that structure-aware inference time operations provide a practical and principled path for neural TSP solvers, complementing training time objectives.
The paper introduces Projected Consistency Inference (PCI), a retraining-free, plug-and-play inference-time technique for diffusion/consistency-based neural TSP solvers. PCI replaces the gradient-based refinement used in FT2T with two structural projections: (1) a feasibility projection that decodes valid Hamiltonian tours from continuous heatmap outputs, and (2) a local search projection (2-opt) that refines tours to local optima. These projections are interleaved with the consistency model's denoise-renoise steps.
The core insight is straightforward: rather than performing gradient descent in a continuous relaxation of a discrete combinatorial space (which is geometrically misaligned with the problem structure), enforce structural constraints directly through projections. This idea draws from IDEQ's training-time structural insights but applies them purely at inference time, requiring no model retraining.
The practical impact is moderate. PCI demonstrates that simple, well-understood combinatorial optimization techniques can effectively replace expensive gradient computations in neural CO pipelines. The 30-40% inference time reduction with improved solution quality is meaningful for deployment scenarios. The plug-and-play nature (no retraining) lowers the barrier to adoption.
However, the broader impact is somewhat limited because:
The paper addresses a timely topic. Inference-time compute scaling is a major trend in generative AI (the paper itself cites Ma et al., 2025 on scaling inference compute for diffusion models). The idea of replacing gradient-based refinement with structure-aware operations aligns with the growing recognition that exploiting problem structure is essential for neural CO.
The work is relevant to the active debate in neural CO about whether neural methods can truly compete with classical heuristics. PCI's ability to match LKH3 on in-distribution instances (while being significantly faster in some regimes) is noteworthy, though the out-of-distribution gap tempers enthusiasm.
The paper's framing as "structure-aware inference" is compelling conceptually but somewhat oversells the contribution — the projections used are standard CO techniques. The more accurate framing might be: "simple post-processing outperforms expensive gradient refinement in neural CO," which is itself an interesting and useful finding.
The paper would benefit from a more detailed analysis of *why* projections work better than gradients, beyond the hypotheses offered in Section 5. For instance, analyzing how often gradient steps actually improve the discrete solution after decoding would provide concrete evidence.
The writing is generally clear, though the background section is somewhat lengthy relative to the methodological contribution.
Generated Jun 9, 2026
Paper 2 addresses a fundamental challenge in neural combinatorial optimization (TSP) by introducing structure-aware projections to diffusion models. Its ability to outperform state-of-the-art neural solvers and classical heuristics while significantly reducing inference time and memory usage provides high methodological rigor and broad real-world applicability in logistics and operations research. Paper 1, while comprehensive, primarily focuses on assembling existing MLLM techniques (LoRA, CoT, TTA) and exhibits narrower fundamental algorithmic innovation compared to Paper 2.
Paper 1 introduces a novel, retraining-free methodology (PCI) that advances the state-of-the-art in neural combinatorial optimization, demonstrating clear improvements in accuracy, speed, and memory usage for a foundational problem (TSP). In contrast, Paper 2 is primarily a replication and evaluation study of an existing model, yielding negative results. While valuable, Paper 1's algorithmic innovation and strong empirical performance offer a broader and more transformative impact on operations research and generative AI.
Paper 1 addresses a fundamental question about LLM reasoning and self-knowledge—whether models truly understand the drivers of their own decisions. This 'superficial belief' concept has broad implications for AI alignment, interpretability, and trust in LLM outputs, affecting virtually all LLM applications. Paper 2 makes a solid incremental improvement to neural TSP solvers with practical engineering value, but operates in a narrower domain. The conceptual contribution of Paper 1 regarding the gap between LLM behavior and self-reported reasoning is more likely to influence multiple research communities and spark follow-up work.
Paper 2 has higher potential scientific impact due to broader applicability and enabling resources: it introduces a new annotated dataset (AntPlan-270) that can catalyze follow-on research, and an end-to-end, editable furnishing pipeline combining a DSL, constraint reasoning traces, preference optimization, and rendering—relevant to vision-language grounding, structured generation, HCI/CAD, and graphics. While Paper 1 is methodologically solid and improves neural TSP inference efficiency, its impact is narrower (mainly TSP/CO inference-time tricks) and less likely to generalize broadly beyond similar diffusion/consistency solvers.
Paper 1 establishes a foundational taxonomy and roadmap for a highly timely field (Self-Explainability in AI). Systematic reviews that define terminology and outline research directions in broad, fast-growing areas typically achieve higher cross-disciplinary impact and citation counts than specific algorithmic improvements. While Paper 2 offers rigorous, state-of-the-art results for a classic problem (TSP), its impact is largely confined to the niche of neural combinatorial optimization, whereas Paper 1 addresses trust and explainability applicable across numerous AI domains.
Paper 2 addresses a critical, highly timely issue: LLM hallucinations and data integrity in clinical research manuscripts. Its architecture for verifiable, deterministic checks has broad applications across biomedical informatics and scientific writing, potentially preventing widespread misinformation. While Paper 1 offers valuable algorithmic improvements for the Traveling Salesman Problem, Paper 2's impact extends across the broader scientific community by ensuring the reliability and safety of AI-assisted research outputs.
PRISM addresses a more novel and timely problem—recovering instruction sets from LLM activations for AI safety and monitoring—which has broad implications across AI alignment, security, and interpretability. The problem formalization of 'instruction set retrieval' is new, and the approach addresses critical concerns about prompt injection and hidden objectives in agentic AI systems. Paper 1, while technically solid, offers an incremental improvement to neural TSP solvers with a relatively narrow scope. Paper 2's relevance to AI safety gives it significantly broader potential impact across the rapidly growing field of LLM deployment.
Paper 2 has broader, more timely impact: long-horizon context management is a central bottleneck for real-world LLM deployments across many domains. TokenMizer’s graph-structured memory proxy is a novel systems approach with clear practical adoption potential (open-source, deployable) and cross-field relevance (NLP, IR, software engineering, HCI). It includes benchmarked gains in token economy and recall plus ablations, suggesting reasonable rigor. Paper 1 is strong but more incremental (inference-time projection + local search) and narrower in application scope (Euclidean TSP solvers), limiting overall impact breadth.
Paper 2 likely has higher impact: it tackles a timely, high-stakes real-world problem (scalable web agents in dynamic cloud UIs) with a deployable training recipe combining distillation + RL, and introduces engineering/methodological contributions (high-determinism rollouts, audit-log-grounded reward evaluation) that generalize to other real-environment agent training. The applications span software testing, DevOps, enterprise automation, and RL/LLM alignment. Paper 1 is solid but more incremental (inference-time projection/local search for neural TSP) with narrower cross-domain reach.
Paper 2 addresses a fundamental NP-hard problem (TSP) and achieves a significant milestone by outperforming both state-of-the-art neural solvers and classical heuristics (LKH3) in rapid generation. Its retraining-free, plug-and-play approach significantly reduces inference time and memory usage, offering high practical utility and broad impact in operations research and combinatorial optimization.