Learning Admissible Heuristics via Cost Partitioning
Hugo Barral, Quentin Cappart, Marie-José Huguet, Sylvie Thiébaux
Abstract
Admissible heuristics are essential for optimal planning, yet learning them remains challenging due to the risk of overestimation. Cost partitioning combines multiple abstraction heuristics while preserving admissibility, but computing optimal partitions online is expensive. We propose a framework that learns to infer admissible cost partitions by leveraging the Lagrangian dual equivalence between cost partitioning and multiplier prediction. Planning states and patterns are encoded as labelled graphs, and an action-centric variant of the Weisfeiler-Leman algorithm extracts structural feature vectors. A deep architecture with axial self-attention and a softmax output layer maps these features to cost weights that satisfy the partition constraints by construction, ensuring admissibility. Experiments demonstrate reduced node expansions compared to suboptimal partitioning baselines while maintaining strict admissibility. To our knowledge, this is the first machine-learned heuristic guaranteed to be admissible.
AI Impact Assessments
(1 models)Scientific Impact Assessment: Learning Admissible Heuristics via Cost Partitioning
1. Core Contribution
This paper addresses a fundamental open problem in AI planning: learning heuristics that are *guaranteed* admissible for optimal planning. The key insight is to avoid learning heuristic values directly (which risks overestimation) and instead learn to *combine* existing admissible heuristics via cost partitioning. The authors exploit the theoretical equivalence between optimal cost partitioning (OCP) and Lagrangian dual multiplier prediction (Pommerening et al. 2019) to reframe the problem: rather than solving an expensive LP at every search node, a neural network predicts cost partition weights that, by architectural construction (softmax over the pattern dimension), satisfy the partition constraint and thus guarantee admissibility.
The pipeline consists of three stages: (1) encoding planning states and pattern abstractions as labeled graphs (AOAG or FLG variants), (2) extracting action-centric features via a novel variant of the Weisfeiler-Leman algorithm (AWL), and (3) predicting partition weights through an axial self-attention architecture with softmax output. The claim of being "the first machine-learned heuristic guaranteed to be admissible" is significant if validated, as it bridges the gap between learned heuristics (which typically lack formal guarantees) and classical admissible heuristics.
2. Methodological Rigor
Strengths in design: The admissibility guarantee is elegant and correct by construction—softmax ensures weights sum to 1 per action, directly satisfying the cost partition constraint as equality. This is a genuinely clever architectural choice that converts a hard constraint satisfaction problem into a structural property of the neural network output.
The AWL variant is a reasonable extension of WL that focuses on action-local neighborhoods, producing per-action embeddings suitable for the cost partitioning task. The axial self-attention design is well-motivated, as it separately contextualizes actions within abstractions and across abstractions.
Weaknesses: The experimental evaluation raises concerns. The learned heuristics are outperformed in coverage by the faster suboptimal baselines (hSCP+ and hGZOCP) on *all* domains, which significantly undermines the practical value. The paper acknowledges this honestly but the gap is substantial—hSCP+ solves 285 tasks vs. 137 for the best learned configuration. The node expansion comparisons (Figure 3) show mixed results even against hOCP+, with clear advantages only against hGZOCP. The training setup uses relatively small synthetic problems (e.g., blocks with 4-9 blocks), and generalization to IPC benchmark sizes appears limited, particularly for logistics and miconic.
The absence of static atoms from graph representations is acknowledged as a likely cause of poor performance on logistics and miconic, but this is a significant limitation that affects two of six domains. The training data generation via random walks may also introduce bias toward irrelevant states, as the authors note.
3. Potential Impact
Theoretical contribution: The core idea of learning to predict cost partitions rather than heuristic values is novel and opens a new research direction. The admissibility-by-construction principle via softmax could inspire similar architectural guarantees in other settings where learned predictions must satisfy linear constraints.
Practical impact (currently limited): The current implementation is slower than suboptimal baselines due to feature extraction and abstraction computation overhead. The 3.42% time on model inference suggests the bottleneck is in the pipeline surrounding the neural network, not the network itself—indicating potential for significant speedup with engineering improvements (e.g., GPU-based feature extraction).
Broader influence: This work connects several research threads—Lagrangian relaxation learning, graph neural networks for planning, and cost partitioning theory—in a coherent framework. It could catalyze further work on architecturally-guaranteed properties in learned planning components.
4. Timeliness & Relevance
The paper addresses a genuine gap: while learning for planning has seen rapid progress, virtually all learned heuristics lack admissibility guarantees. With growing interest in using ML for combinatorial optimization (including Lagrangian multiplier prediction), this work is timely. The connection to recent work on learning Lagrangian multipliers (Abbas & Swoboda 2024; Parjadis et al. 2024; Bessa et al. 2025) grounds it in an active methodological trend.
5. Strengths & Limitations
Key Strengths:
Notable Limitations:
Overall Assessment
This paper makes a valuable conceptual contribution by demonstrating that admissible heuristics can be learned with formal guarantees through architectural design. The idea of learning cost partition weights rather than heuristic values is creative and well-founded theoretically. However, the empirical results do not yet demonstrate practical advantages over existing methods—the learned heuristics solve fewer problems than even suboptimal baselines. The work is best viewed as a proof of concept establishing a new research direction rather than a practically competitive method. Several identified improvements (GPU feature extraction, richer graph representations, alternative training objectives) could substantially strengthen future iterations.
Generated Jun 5, 2026
Comparison History (18)
While Paper 1 addresses an important human-AI interaction problem, Paper 2 presents a fundamental algorithmic breakthrough by introducing the first machine-learned heuristic guaranteed to be admissible. This represents a major methodological innovation in optimal planning and search. By elegantly combining Lagrangian duals, graph neural networks, and cost constraints to ensure strict admissibility, Paper 2 solves a long-standing challenge in the field. Its rigorous theoretical guarantees and novel integration of deep learning with classical planning give it a higher potential for foundational scientific impact.
Paper 2 likely has higher scientific impact: it introduces a principled learning framework with a formal guarantee (admissibility) for heuristics in optimal planning, addressing a long-standing barrier to using ML in search while retaining correctness. The Lagrangian-dual view and constraint-by-construction network are novel and methodologically rigorous, and the approach can generalize across planning and other A*/search settings needing safe learned guidance. Paper 1 is timely and valuable for AI safety evaluation, but as a benchmark its impact depends on adoption and may be narrower than a guaranteed-correct learning method that can propagate broadly in planning/search.
Paper 2 addresses a highly timely and critical issue in modern AI: the safety and monitoring of LLM agents prone to reward hacking. Its focus on AI alignment, mechanistic interpretability, and real-world agentic risks gives it broader immediate real-world applicability and relevance compared to Paper 1. While Paper 1 presents a strong methodological breakthrough in classical optimal planning (the first guaranteed admissible ML heuristic), Paper 2's alignment with urgent safety concerns in the rapidly expanding field of autonomous LLMs suggests a significantly higher potential for broad scientific and societal impact.
Paper 2 introduces the first machine-learned heuristic guaranteed to be admissible, which is a fundamental theoretical contribution bridging AI planning and machine learning. The Lagrangian dual equivalence insight and architectural design ensuring admissibility by construction represent genuine novelty with broad implications for optimal planning, search algorithms, and safe AI. Paper 1, while showing strong empirical results, is an incremental engineering contribution combining known techniques (tree search, self-refinement, memory) in a training-free LLM reasoning framework—a crowded space with rapidly diminishing novelty.
Paper 2 addresses a highly timely and critical issue in the rapidly growing field of LLM reasoning and RL from verifiable rewards (RLVR). By providing a rigorous causal framework to distinguish genuine reward-design effects from self-consistency, it offers deep methodological value that can immediately impact how alignment and reasoning models are evaluated. While Paper 1 presents a significant novelty in classical planning, Paper 2's relevance to the broader, fast-paced generative AI community gives it a higher potential for widespread scientific impact.
Paper 1 addresses a timely and high-impact problem—reinforcement learning for tool-augmented multimodal agents—which is at the frontier of current AI research. It identifies a concrete, formally characterized failure mode (credit misassignment in GRPO), quantifies it empirically, and proposes a practical, plug-and-play solution (TAPO) that works across multiple RL algorithms and benchmarks with negligible overhead. The breadth of applicability to multimodal search agents and compatibility with mainstream RL methods gives it wider near-term impact. Paper 2 is a solid contribution to classical AI planning with formal guarantees, but addresses a narrower community and problem setting with less immediate broad impact.
Paper 1 likely has higher impact: it introduces a broad, expert-validated benchmark for continual learning in LLM-based agents across six real-world domains, plus an evaluation methodology (gain metric) that can become a community standard. Benchmarks in fast-moving frontier AI tend to drive widespread follow-on work, enable reproducibility, and influence many subfields (agents, memory, evaluation, safety). Paper 2 is novel and rigorous with guaranteed-admissible learned heuristics, but its scope is narrower (optimal planning/search) and may affect a smaller research community despite strong technical contribution.
Paper 2 presents a concrete, novel technical contribution—the first machine-learned heuristic guaranteed to be admissible—combining Lagrangian duality, graph neural networks, and cost partitioning with formal guarantees. This is a specific, verifiable advance with clear theoretical grounding. Paper 1 is a perspective/review paper that surveys hybrid modeling strategies for neurological disorders without presenting new methods or results. While Paper 1 covers an important topic, perspective papers generally have lower direct scientific impact than papers introducing novel, provably correct methodologies that solve an open problem.
Paper 2 addresses a highly timely and pervasive issue in modern AI—LLM agents repeating mistakes. By formalizing temporal and epistemic regret, it provides a strong theoretical framework bridging causal inference and agentic systems. Its broad applicability across autonomous LLM pipelines offers significantly wider potential impact than Paper 1, which is fundamentally scoped to classical optimal planning.
Paper 1 addresses a highly active and broader field (LLM agents and continual adaptation) with a novel approach to online parametric memory updates. Its potential applications across general AI agents give it a much wider impact and relevance compared to Paper 2, which, while theoretically significant, is confined to the narrower domain of classical optimal planning.
Paper 1 presents a methodologically rigorous, concrete advancement in AI optimal planning, claiming the first machine-learned heuristic guaranteed to be admissible. Its algorithmic contributions have direct, measurable applications in computer science, robotics, and logistics. In contrast, Paper 2 is a philosophical and metaphysical argument about AI consciousness. While ethically relevant, Paper 1's technical innovation, empirical validation, and practical utility give it a much higher potential for widespread, measurable scientific impact and downstream technical applications.
Paper 2 presents a scalable simulation platform and benchmark for mobile GUI agents, a rapidly growing and highly popular field. By enabling scalable RL and verifiable evaluation for LLM/VLM agents, it provides foundational infrastructure likely to be widely adopted and cited by many researchers. While Paper 1 offers a strong theoretical contribution to optimal planning, its impact is more confined to a niche community compared to the broad, immediate applicability of Paper 2 in modern AI agent research.
Paper 2 proposes a paradigm-shifting quantum-native architecture with massive implications across AI, quantum computing, and mathematics. By theoretically bypassing the quadratic bottleneck of classical self-attention and demonstrating deterministic, exact generalization ('crystallization') on real NISQ hardware, it promises a universally superior substrate for mathematical reasoning. In contrast, while Paper 1 makes a highly rigorous and important contribution to optimal AI planning (the first guaranteed admissible learned heuristic), its impact is largely confined to a specific subfield, making Paper 2's potential breadth and magnitude of impact significantly higher.
Paper 1 addresses a fundamental open problem in AI planning—learning admissible heuristics with formal guarantees—by providing the first machine-learned heuristic that is guaranteed admissible by construction. This bridges the gap between learned and formal methods in a principled way, combining Lagrangian duality, graph neural networks, and cost partitioning theory. Its theoretical contribution (guaranteed admissibility) is a significant milestone with broad implications for optimal planning and safe AI. Paper 2, while practically useful, proposes a more incremental engineering contribution (consequence-aware compute allocation) that, though well-executed, has narrower theoretical novelty.
Paper 2 addresses hallucinations in LLMs, which is one of the most pressing and broadly impactful problems in AI today. Its lightweight, training-free decoding framework (DeLask) is immediately applicable to a wide range of existing LLMs, giving it enormous practical reach. The theoretical grounding connecting Transformer layers to gradient descent steps and the novel driftance metric offer meaningful conceptual contributions. Paper 1, while technically rigorous and novel in guaranteeing admissibility in learned heuristics, addresses a narrower community (optimal planning/search). Paper 2's breadth of impact, timeliness, and real-world applicability give it higher potential scientific impact.
Paper 1 targets the highly active field of Large Language Model architectures by fusing Transformers and State Space Models. Its approach to improving attention mechanisms has broad real-world applicability and high relevance to current AI trends. While Paper 2 presents a significant theoretical milestone in classical planning by learning admissible heuristics, its overall breadth of impact and application scope are narrower compared to fundamental advancements in foundational language models.
Paper 1 introduces the first machine-learned heuristic with guaranteed admissibility, solving a fundamental open problem at the intersection of machine learning and AI planning. The theoretical contribution—connecting cost partitioning's Lagrangian dual to neural network prediction with admissibility by construction—is novel and elegant. Paper 2 addresses an important but narrower engineering problem (constant-VRAM memory for embodied agents), with modest empirical gains and a self-acknowledged vacuous theoretical bound. Paper 1's guaranteed admissibility property and broader applicability to optimal planning give it higher potential for lasting scientific impact across AI subfields.
Paper 1 presents a genuinely novel contribution at the intersection of machine learning and AI planning—the first learned heuristic with guaranteed admissibility. This addresses a fundamental open problem by combining cost partitioning theory with deep learning in a principled way (Lagrangian duality, structural graph features, architectural guarantees). It has broad impact across planning, combinatorial optimization, and ML theory. Paper 2 addresses an important practical problem in agentic AI governance but is more incremental, extending existing authorization frameworks with new compositional primitives. While timely, it is narrower in scientific scope and more engineering-oriented.