Learning Admissible Heuristics via Cost Partitioning

Hugo Barral, Quentin Cappart, Marie-José Huguet, Sylvie Thiébaux

Jun 3, 2026

arXiv:2606.04597v1 PDF

cs.AI(primary)

#1512of 3404·Artificial Intelligence

#1512 of 3404 · Artificial Intelligence

Tournament Score

1416±43

10501800

44%

Win Rate

Wins

Losses

Matches

Rating

5.5/ 10

Significance7

Rigor5.5

Novelty7.5

Clarity7.5

Tournament Score

1416±43

10501800

44%

Win Rate

Wins

Losses

Matches

Rating

5.5/ 10

Significance

Rigor

Novelty

Clarity

Abstract

Admissible heuristics are essential for optimal planning, yet learning them remains challenging due to the risk of overestimation. Cost partitioning combines multiple abstraction heuristics while preserving admissibility, but computing optimal partitions online is expensive. We propose a framework that learns to infer admissible cost partitions by leveraging the Lagrangian dual equivalence between cost partitioning and multiplier prediction. Planning states and patterns are encoded as labelled graphs, and an action-centric variant of the Weisfeiler-Leman algorithm extracts structural feature vectors. A deep architecture with axial self-attention and a softmax output layer maps these features to cost weights that satisfy the partition constraints by construction, ensuring admissibility. Experiments demonstrate reduced node expansions compared to suboptimal partitioning baselines while maintaining strict admissibility. To our knowledge, this is the first machine-learned heuristic guaranteed to be admissible.

AI Impact Assessments

(1 models)

Scientific Impact Assessment: Learning Admissible Heuristics via Cost Partitioning

1. Core Contribution

This paper addresses a fundamental open problem in AI planning: learning heuristics that are *guaranteed* admissible for optimal planning. The key insight is to avoid learning heuristic values directly (which risks overestimation) and instead learn to *combine* existing admissible heuristics via cost partitioning. The authors exploit the theoretical equivalence between optimal cost partitioning (OCP) and Lagrangian dual multiplier prediction (Pommerening et al. 2019) to reframe the problem: rather than solving an expensive LP at every search node, a neural network predicts cost partition weights that, by architectural construction (softmax over the pattern dimension), satisfy the partition constraint and thus guarantee admissibility.

The pipeline consists of three stages: (1) encoding planning states and pattern abstractions as labeled graphs (AOAG or FLG variants), (2) extracting action-centric features via a novel variant of the Weisfeiler-Leman algorithm (AWL), and (3) predicting partition weights through an axial self-attention architecture with softmax output. The claim of being "the first machine-learned heuristic guaranteed to be admissible" is significant if validated, as it bridges the gap between learned heuristics (which typically lack formal guarantees) and classical admissible heuristics.

2. Methodological Rigor

Strengths in design: The admissibility guarantee is elegant and correct by construction—softmax ensures weights sum to 1 per action, directly satisfying the cost partition constraint as equality. This is a genuinely clever architectural choice that converts a hard constraint satisfaction problem into a structural property of the neural network output.

The AWL variant is a reasonable extension of WL that focuses on action-local neighborhoods, producing per-action embeddings suitable for the cost partitioning task. The axial self-attention design is well-motivated, as it separately contextualizes actions within abstractions and across abstractions.

Weaknesses: The experimental evaluation raises concerns. The learned heuristics are outperformed in coverage by the faster suboptimal baselines (hSCP+ and hGZOCP) on *all* domains, which significantly undermines the practical value. The paper acknowledges this honestly but the gap is substantial—hSCP+ solves 285 tasks vs. 137 for the best learned configuration. The node expansion comparisons (Figure 3) show mixed results even against hOCP+, with clear advantages only against hGZOCP. The training setup uses relatively small synthetic problems (e.g., blocks with 4-9 blocks), and generalization to IPC benchmark sizes appears limited, particularly for logistics and miconic.

The absence of static atoms from graph representations is acknowledged as a likely cause of poor performance on logistics and miconic, but this is a significant limitation that affects two of six domains. The training data generation via random walks may also introduce bias toward irrelevant states, as the authors note.

3. Potential Impact

Theoretical contribution: The core idea of learning to predict cost partitions rather than heuristic values is novel and opens a new research direction. The admissibility-by-construction principle via softmax could inspire similar architectural guarantees in other settings where learned predictions must satisfy linear constraints.

Practical impact (currently limited): The current implementation is slower than suboptimal baselines due to feature extraction and abstraction computation overhead. The 3.42% time on model inference suggests the bottleneck is in the pipeline surrounding the neural network, not the network itself—indicating potential for significant speedup with engineering improvements (e.g., GPU-based feature extraction).

Broader influence: This work connects several research threads—Lagrangian relaxation learning, graph neural networks for planning, and cost partitioning theory—in a coherent framework. It could catalyze further work on architecturally-guaranteed properties in learned planning components.

4. Timeliness & Relevance

The paper addresses a genuine gap: while learning for planning has seen rapid progress, virtually all learned heuristics lack admissibility guarantees. With growing interest in using ML for combinatorial optimization (including Lagrangian multiplier prediction), this work is timely. The connection to recent work on learning Lagrangian multipliers (Abbas & Swoboda 2024; Parjadis et al. 2024; Bessa et al. 2025) grounds it in an active methodological trend.

5. Strengths & Limitations

Key Strengths:

First framework providing formal admissibility guarantees for learned heuristics—a meaningful conceptual advance.

Elegant use of softmax to enforce partition constraints by construction.

Well-motivated architectural choices (axial attention for dual contextualization).

Honest presentation of limitations with clear future directions.

The Lagrangian dual perspective unifying cost partitioning and multiplier prediction is theoretically appealing.

Notable Limitations:

Coverage is substantially worse than fast suboptimal baselines across all domains, limiting practical applicability.

Feature extraction is the computational bottleneck, and the paper doesn't explore GPU acceleration for this step despite identifying it as a clear improvement.

Only 6 domains tested, with clear failures on 2 (logistics, miconic) due to missing static atoms.

Training on small instances with limited generalization evidence to larger problems.

The paper trains on OCP solutions but acknowledges that optimal partitions aren't necessarily the best training targets—alternative loss functions (e.g., directly maximizing heuristic value) aren't explored.

Comparison with only cost-partitioning baselines; no comparison with other learned heuristics or LP-based heuristics more broadly.

The pattern selection strategy (projections up to 2 variables) is quite restrictive.

Overall Assessment

This paper makes a valuable conceptual contribution by demonstrating that admissible heuristics can be learned with formal guarantees through architectural design. The idea of learning cost partition weights rather than heuristic values is creative and well-founded theoretically. However, the empirical results do not yet demonstrate practical advantages over existing methods—the learned heuristics solve fewer problems than even suboptimal baselines. The work is best viewed as a proof of concept establishing a new research direction rather than a practically competitive method. Several identified improvements (GPU feature extraction, richer graph representations, alternative training objectives) could substantially strengthen future iterations.

Rating:5.5/ 10

Significance 7Rigor 5.5Novelty 7.5Clarity 7.5

Generated Jun 5, 2026

Comparison History (18)

vs. A Framework for Measuring Appropriate Reliance on Set-Valued AI Advice

gemini-3.16/6/2026

While Paper 1 addresses an important human-AI interaction problem, Paper 2 presents a fundamental algorithmic breakthrough by introducing the first machine-learned heuristic guaranteed to be admissible. This represents a major methodological innovation in optimal planning and search. By elegantly combining Lagrangian duals, graph neural networks, and cost constraints to ensure strict admissibility, Paper 2 solves a long-standing challenge in the field. Its rigorous theoretical guarantees and novel integration of deep learning with classical planning give it a higher potential for foundational scientific impact.

vs. CogManip: Benchmarking Manipulative Behavior in Multi-Turn Interactions with Large Language Model

gpt-5.26/6/2026

Paper 2 likely has higher scientific impact: it introduces a principled learning framework with a formal guarantee (admissibility) for heuristics in optimal planning, addressing a long-standing barrier to using ML in search while retaining correctness. The Lagrangian-dual view and constraint-by-construction network are novel and methodologically rigorous, and the approach can generalize across planning and other A*/search settings needing safe learned guidance. Paper 1 is timely and valuable for AI safety evaluation, but as a benchmark its impact depends on adoption and may be narrower than a guaranteed-correct learning method that can propagate broadly in planning/search.

vs. From Reward-Hack Activations to Agentic Risk States: Context-Calibrated Mechanistic Monitoring in LLM Agents

gemini-3.16/6/2026

Paper 2 addresses a highly timely and critical issue in modern AI: the safety and monitoring of LLM agents prone to reward hacking. Its focus on AI alignment, mechanistic interpretability, and real-world agentic risks gives it broader immediate real-world applicability and relevance compared to Paper 1. While Paper 1 presents a strong methodological breakthrough in classical optimal planning (the first guaranteed admissible ML heuristic), Paper 2's alignment with urgent safety concerns in the rapidly expanding field of autonomous LLMs suggests a significantly higher potential for broad scientific and societal impact.

vs. ReTreVal: Reasoning Tree with Validation and Cross-Problem Memory for Large Language Models

claude-opus-4.66/5/2026

Paper 2 introduces the first machine-learned heuristic guaranteed to be admissible, which is a fundamental theoretical contribution bridging AI planning and machine learning. The Lagrangian dual equivalence insight and architectural design ensuring admissibility by construction represent genuine novelty with broad implications for optimal planning, search algorithms, and safe AI. Paper 1, while showing strong empirical results, is an incremental engineering contribution combining known techniques (tree search, self-refinement, memory) in a training-free LLM reasoning framework—a crowded space with rapidly diminishing novelty.

vs. A Pre-Registered Causal Partition of Self-Consistency Elicitation and Reward Design in RLVR

gemini-3.16/5/2026

Paper 2 addresses a highly timely and critical issue in the rapidly growing field of LLM reasoning and RL from verifiable rewards (RLVR). By providing a rigorous causal framework to distinguish genuine reward-design effects from self-consistency, it offers deep methodological value that can immediately impact how alignment and reasoning models are evaluated. While Paper 1 presents a significant novelty in classical planning, Paper 2's relevance to the broader, fast-paced generative AI community gives it a higher potential for widespread scientific impact.

vs. TAPO: Tool-Aware Policy Optimization via Credit Transfer for Multimodal Search Agents

claude-opus-4.66/5/2026

Paper 1 addresses a timely and high-impact problem—reinforcement learning for tool-augmented multimodal agents—which is at the frontier of current AI research. It identifies a concrete, formally characterized failure mode (credit misassignment in GRPO), quantifies it empirically, and proposes a practical, plug-and-play solution (TAPO) that works across multiple RL algorithms and benchmarks with negligible overhead. The breadth of applicability to multimodal search agents and compatibility with mainstream RL methods gives it wider near-term impact. Paper 2 is a solid contribution to classical AI planning with formal guarantees, but addresses a narrower community and problem setting with less immediate broad impact.

vs. Continual Learning Bench: Evaluating Frontier AI Systems in Real-World Stateful Environments

gpt-5.26/5/2026

Paper 1 likely has higher impact: it introduces a broad, expert-validated benchmark for continual learning in LLM-based agents across six real-world domains, plus an evaluation methodology (gain metric) that can become a community standard. Benchmarks in fast-moving frontier AI tend to drive widespread follow-on work, enable reproducibility, and influence many subfields (agents, memory, evaluation, safety). Paper 2 is novel and rigorous with guaranteed-admissible learned heuristics, but its scope is narrower (optimal planning/search) and may affect a smaller research community despite strong technical contribution.

vs. Integrating Mechanistic and Data-Driven Models for Neurological Disorders through Differentiable Programming

claude-opus-4.66/5/2026

Paper 2 presents a concrete, novel technical contribution—the first machine-learned heuristic guaranteed to be admissible—combining Lagrangian duality, graph neural networks, and cost partitioning with formal guarantees. This is a specific, verifiable advance with clear theoretical grounding. Paper 1 is a perspective/review paper that surveys hybrid modeling strategies for neurological disorders without presenting new methods or results. While Paper 1 covers an important topic, perspective papers generally have lower direct scientific impact than papers introducing novel, provably correct methodologies that solve an open problem.

vs. Trivium: Temporal Regret as a First-Class Objective for Causal-Memory Controllers

gemini-3.16/5/2026

Paper 2 addresses a highly timely and pervasive issue in modern AI—LLM agents repeating mistakes. By formalizing temporal and epistemic regret, it provides a strong theoretical framework bridging causal inference and agentic systems. Its broad applicability across autonomous LLM pipelines offers significantly wider potential impact than Paper 1, which is fundamentally scoped to classical optimal planning.

vs. Scaling Self-Evolving Agents via Parametric Memory

gemini-3.16/5/2026

Paper 1 addresses a highly active and broader field (LLM agents and continual adaptation) with a novel approach to online parametric memory updates. Its potential applications across general AI agents give it a much wider impact and relevance compared to Paper 2, which, while theoretically significant, is confined to the narrower domain of classical optimal planning.

vs. Unplugging a Seemingly Sentient Machine Is the Rational Choice -- A Metaphysical Perspective

gemini-3.16/5/2026

Paper 1 presents a methodologically rigorous, concrete advancement in AI optimal planning, claiming the first machine-learned heuristic guaranteed to be admissible. Its algorithmic contributions have direct, measurable applications in computer science, robotics, and logistics. In contrast, Paper 2 is a philosophical and metaphysical argument about AI consciousness. While ethically relevant, Paper 1's technical innovation, empirical validation, and practical utility give it a much higher potential for widespread, measurable scientific impact and downstream technical applications.

vs. MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

gemini-3.16/5/2026

Paper 2 presents a scalable simulation platform and benchmark for mobile GUI agents, a rapidly growing and highly popular field. By enabling scalable RL and verifiable evaluation for LLM/VLM agents, it provides foundational infrastructure likely to be widely adopted and cited by many researchers. While Paper 1 offers a strong theoretical contribution to optimal planning, its impact is more confined to a niche community compared to the broad, immediate applicability of Paper 2 in modern AI agent research.

vs. Universal Quantum Transformer

gemini-3.16/5/2026

Paper 2 proposes a paradigm-shifting quantum-native architecture with massive implications across AI, quantum computing, and mathematics. By theoretically bypassing the quadratic bottleneck of classical self-attention and demonstrating deterministic, exact generalization ('crystallization') on real NISQ hardware, it promises a universally superior substrate for mathematical reasoning. In contrast, while Paper 1 makes a highly rigorous and important contribution to optimal AI planning (the first guaranteed admissible learned heuristic), its impact is largely confined to a specific subfield, making Paper 2's potential breadth and magnitude of impact significantly higher.

vs. Not All Errors Are Equal: Consequence-Aware Reasoning Compute Allocation

claude-opus-4.66/5/2026

Paper 1 addresses a fundamental open problem in AI planning—learning admissible heuristics with formal guarantees—by providing the first machine-learned heuristic that is guaranteed admissible by construction. This bridges the gap between learned and formal methods in a principled way, combining Lagrangian duality, graph neural networks, and cost partitioning theory. Its theoretical contribution (guaranteed admissibility) is a significant milestone with broad implications for optimal planning and safe AI. Paper 2, while practically useful, proposes a more incremental engineering contribution (consequence-aware compute allocation) that, though well-executed, has narrower theoretical novelty.

vs. Mitigating Hallucinations in Large Language Models Via Decoder Layer Skipping

claude-opus-4.66/5/2026

Paper 2 addresses hallucinations in LLMs, which is one of the most pressing and broadly impactful problems in AI today. Its lightweight, training-free decoding framework (DeLask) is immediately applicable to a wide range of existing LLMs, giving it enormous practical reach. The theoretical grounding connecting Transformer layers to gradient descent steps and the novel driftance metric offer meaningful conceptual contributions. Paper 1, while technically rigorous and novel in guaranteeing admissibility in learned heuristics, addresses a narrower community (optimal planning/search). Paper 2's breadth of impact, timeliness, and real-world applicability give it higher potential scientific impact.

vs. Forget Attention: Importance-Aware Attention Is All You Need

gemini-3.16/5/2026

Paper 1 targets the highly active field of Large Language Model architectures by fusing Transformers and State Space Models. Its approach to improving attention mechanisms has broad real-world applicability and high relevance to current AI trends. While Paper 2 presents a significant theoretical milestone in classical planning by learning admissible heuristics, its overall breadth of impact and application scope are narrower compared to fundamental advancements in foundational language models.

vs. AURA: Action-Gated Memory for Robot Policies at Constant VRAM

claude-opus-4.66/5/2026

Paper 1 introduces the first machine-learned heuristic with guaranteed admissibility, solving a fundamental open problem at the intersection of machine learning and AI planning. The theoretical contribution—connecting cost partitioning's Lagrangian dual to neural network prediction with admissibility by construction—is novel and elegant. Paper 2 addresses an important but narrower engineering problem (constant-VRAM memory for embodied agents), with modest empirical gains and a self-acknowledged vacuous theoretical bound. Paper 1's guaranteed admissibility property and broader applicability to optimal planning give it higher potential for lasting scientific impact across AI subfields.

vs. Overlaying Governance: A Compositional Authorization Framework for Delegation and Scope in Agentic AI

claude-opus-4.66/5/2026

Paper 1 presents a genuinely novel contribution at the intersection of machine learning and AI planning—the first learned heuristic with guaranteed admissibility. This addresses a fundamental open problem by combining cost partitioning theory with deep learning in a principled way (Lagrangian duality, structural graph features, architectural guarantees). It has broad impact across planning, combinatorial optimization, and ML theory. Paper 2 addresses an important practical problem in agentic AI governance but is more incremental, extending existing authorization frameworks with new compositional primitives. While timely, it is narrower in scientific scope and more engineering-oriented.