LLM-Evolved Pattern Generators for Optimal Classical Planning
Windy Phung, Dominik Drexler, Arnaud Lequen, Jendrik Seipp
Abstract
Learned heuristics have recently become a competitive alternative to traditional domain-independent heuristics for satisficing planning. Existing approaches, however, focus on improving search guidance rather than guaranteeing admissibility, which makes them unsuitable for optimal classical planning. We present the first method for learning domain-dependent heuristics that are admissible by design and thus preserve the optimality guarantees of A* search. Instead of learning a direct mapping from states to heuristic values, we learn to construct abstractions that induce admissible heuristics. We use an LLM-driven evolutionary program-synthesis framework to obtain, for each domain, a program that produces a pattern collection for any task in that domain, and we combine the resulting patterns admissibly via saturated cost partitioning. Empirically, the learned programs encode interpretable domain-specific insights, run with negligible overhead at test time and yield heuristics that match the coverage of state-of-the-art domain-independent baselines on several domains while evaluating each state substantially faster.
AI Impact Assessments
(1 models)Scientific Impact Assessment: LLM-Evolved Pattern Generators for Optimal Classical Planning
1. Core Contribution
This paper addresses a genuine gap at the intersection of learned heuristics and optimal classical planning. While prior work on learning heuristics has focused on satisficing planning (sacrificing admissibility), this paper presents the first method for learning domain-dependent heuristics that are admissible by construction. The key insight is elegant: rather than learning a direct state-to-value mapping (which is difficult to make admissible), the authors learn to construct *abstractions* — specifically, pattern collections for pattern database (PDB) heuristics — that inherently produce admissible estimates. The admissibility guarantee comes from the mathematical properties of projections and saturated cost partitioning (SCP), not from any learned constraint.
The approach uses OpenEvolve (an open-source AlphaEvolve variant) to synthesize, per domain, a Python program that maps any task description to a pattern collection. This framing — learning a *generator program* rather than a fixed collection — is a meaningful architectural choice that enables generalization across task sizes within a domain.
2. Methodological Rigor
The methodology is sound in its theoretical foundations. The admissibility guarantee is airtight: projections yield admissible heuristics, and SCP preserves admissibility when combining them. The evolutionary loop is well-defined with a clear scoring function combining validity, expansions, and search time.
However, several aspects weaken the experimental rigor:
3. Potential Impact
The work opens a genuinely new research direction: learning admissible abstractions rather than learning heuristic values directly. This sidesteps the fundamental difficulty of enforcing admissibility on learned value functions while still capturing domain-specific structure.
Practical implications: The generated patterns run with negligible overhead at test time (the program is a simple Python function), unlike SYS-SCP which can spend 100+ seconds on pattern generation per task. This makes the approach attractive for settings where the same domain is solved repeatedly (e.g., industrial planning applications).
Broader implications: The idea of using LLMs to synthesize *structural components* of well-understood algorithmic frameworks (rather than end-to-end solutions) is a promising paradigm. It preserves formal guarantees while leveraging LLMs' pattern recognition capabilities. This template could extend to other abstraction types (Cartesian abstractions, merge-and-shrink strategies) and potentially other combinatorial optimization contexts.
4. Timeliness & Relevance
The paper is highly timely, sitting at the confluence of three active trends: (1) LLM-driven program synthesis (AlphaEvolve/FunSearch), (2) learning for planning, and (3) optimal classical planning. The observation that existing learned heuristics sacrifice admissibility is a well-known limitation that the community has struggled with. The paper directly addresses this bottleneck.
The use of OpenEvolve (open-source) enhances reproducibility and accessibility relative to proprietary systems.
5. Strengths & Limitations
Key Strengths:
Notable Weaknesses:
6. Additional Observations
The appendix containing full generator code is valuable for reproducibility and reveals interesting domain-specific strategies (e.g., the Childsnack generator's handling of gluten allergies, the Transport generator's per-package delivery patterns). This transparency is commendable.
The scoring function design — combining validity, expansions, and time — introduces an implicit bias toward faster-to-evaluate patterns rather than more informative ones. This explains both the strength (per-state speed) and weakness (guidance quality) of the approach. Alternative scoring functions could shift this balance.
The constraint of at most 20 patterns with 5×10⁶ states each is quite restrictive and may limit the approach's ability to find competitive patterns in domains like Miconic where many small patterns are more effective.
Generated Jun 2, 2026
Comparison History (21)
Paper 2 likely has higher impact: it introduces the first admissible-by-design, learned domain-dependent heuristics for optimal classical planning, preserving A* optimality—an important methodological advance with clear guarantees. The LLM-evolved program synthesis of pattern generators plus saturated cost partitioning is innovative, interpretable, and offers practical speedups with negligible test-time overhead, making it relevant to planning, search, and automated reasoning. Paper 1 is useful for instruction-following robustness, but its gains are more incremental and may be absorbed by broader alignment/prompting advances.
Paper 2 likely has higher impact due to timeliness and breadth: long-context LLM memory is a central, rapidly growing area with wide applicability across QA, agents, RAG, and document understanding. The proposed answer-conditioned information-gain reward is a generally reusable training signal that could influence RL formulations for memory, summarization, and tool-using agents. Paper 1 is novel and rigorous within optimal classical planning (admissible learned heuristics via synthesized abstractions), but the field is narrower and domain-dependent learning may limit general uptake compared to broadly deployable LLM training methodology.
Paper 2 is more novel methodologically: it uses LLM-guided evolutionary program synthesis to generate admissible, domain-dependent abstractions for optimal planning—addressing a key gap (learning while preserving A* optimality). Its rigor is strengthened by formal admissibility via abstractions and saturated cost partitioning, and its potential impact spans automated planning, program synthesis, and LLM-based tool generation with clear real-world relevance (robotics, logistics). Paper 1 is a valuable benchmark, but its impact is narrower (evaluation dataset) and may be more incremental relative to existing personalization/behavior benchmarks.
Paper 2 demonstrates higher potential scientific impact due to its interdisciplinary approach, bridging AI, physics, and materials science. While Paper 1 advances optimal classical planning within computer science, Paper 2 addresses a critical real-world bottleneck—synthesis planning for novel inorganic materials. By coupling LLMs with physics-based simulations, it paves the way for practical advancements in discovering and manufacturing new materials, which has broad, transformative applications in energy, electronics, and manufacturing.
Paper 2 has higher potential impact: it introduces a first-of-its-kind approach to learn domain-dependent heuristics that are admissible by design, directly advancing optimal classical planning with preserved A* guarantees—an enduring, broadly used formal setting. The method is technically rigorous (program synthesis + admissible combination via saturated cost partitioning), yields interpretable artifacts, and promises practical speedups with negligible test-time overhead. Its contribution generalizes across planning domains and connects LLMs to principled algorithm design. Paper 1 is timely for LLM safety, but its self-auditing premise may face robustness and external-validity limits in safety-critical oversight.
Paper 2 is more scientifically novel: it proposes admissible-by-design, domain-dependent heuristics for optimal classical planning via LLM-guided evolutionary program synthesis of abstraction/pattern generators, preserving A* optimality—an advance over typical learned (inadmissible) heuristics. Its impact spans planning, heuristic search, program synthesis, and LLM-assisted algorithm design, with strong potential for broad reuse in optimal planning pipelines. Paper 1 is timely and practically useful for LLM systems reliability, but its contributions are more incremental (orchestration/recovery engineering) and may be narrower and faster-moving with the tooling ecosystem.
Paper 2 presents a concrete, novel contribution—the first method for learning admissible domain-dependent heuristics for optimal classical planning using LLM-driven evolutionary program synthesis. It combines multiple established techniques (A* search, pattern databases, saturated cost partitioning, LLM-based program synthesis) in an innovative way, with strong empirical results matching state-of-the-art baselines. Paper 1 proposes an abstract mathematical framework for conflict in data modulation but lacks concrete algorithms, experiments, or demonstrated applications, limiting its immediate and verifiable impact.
Paper 1 is more novel and broadly impactful: it introduces an admissible-by-construction learning framework for optimal classical planning, combining LLM-driven program synthesis with evolutionary search to generate abstractions/pattern databases, preserving A* optimality—an important methodological advance in AI planning. Its approach is timely (LLMs for algorithm design), potentially transferable to other heuristic search and verification-sensitive settings, and claims interpretable artifacts plus runtime benefits. Paper 2 addresses an important applied education problem, but the methods (knowledge graphs, GATs, temporal models) are relatively standard and likely less generalizable scientifically beyond educational analytics.
Paper 2 presents a fundamentally novel approach—combining LLM-driven evolutionary program synthesis with classical AI planning to learn admissible heuristics, which is a first-of-its-kind contribution. It bridges two major AI paradigms (LLMs and formal planning), provides theoretical guarantees (admissibility), and produces interpretable results. Paper 1 offers incremental improvements to radiology report generation with modest quantitative gains. Paper 2's methodological innovation, cross-field impact (AI planning, program synthesis, LLMs), and the significance of solving optimal planning with learned heuristics give it substantially higher scientific impact potential.
Paper 2 addresses a critical bottleneck in the highly active field of LLM agent evaluation. By providing a novel, interpretable framework for analyzing agent trajectories and demonstrating tangible improvements on major benchmarks like SWE-bench, it offers broader applicability and higher timeliness than Paper 1. While Paper 1 presents a neat application of LLMs to classical planning, its impact is likely confined to a narrower, more specialized subfield, whereas Paper 2 appeals to the rapidly growing community working on autonomous AI agents.
Paper 1 represents a significant methodological breakthrough by bridging modern generative AI (LLMs) with classical symbolic AI. It presents the first method for learning domain-dependent heuristics that guarantee admissibility in optimal classical planning, solving a major limitation of prior neural-heuristic approaches. By using LLMs to synthesize interpretable abstraction programs rather than black-box mappings, it offers high methodological rigor, safety (optimality guarantees), and broad implications for hybrid neuro-symbolic systems. Paper 2 is practically valuable but addresses a more specific niche in constraint learning and recourse.
Paper 1 is more scientifically novel and methodologically substantive: it proposes the first admissible-by-design learned, domain-dependent heuristics for optimal classical planning by learning abstraction/pattern generators via LLM-guided evolutionary program synthesis, with principled admissible combination (saturated cost partitioning) and measurable efficiency gains. This advances core planning theory/practice and could influence heuristic design, program synthesis for search, and trustworthy ML-for-planning. Paper 2 is timely and useful as an evaluation benchmark for personalized tool-using agents, but benchmarks typically yield narrower scientific novelty and longer-term impact unless they become a dominant standard.
Paper 1 presents a novel and technically rigorous contribution at the intersection of LLMs and classical AI planning, introducing the first method for learning admissible domain-dependent heuristics via LLM-driven program synthesis. This addresses a fundamental gap in optimal planning with strong theoretical guarantees. Its methodological innovation—combining LLM-evolved pattern generators with saturated cost partitioning—offers a new paradigm with clear practical impact. Paper 2, while interesting, primarily applies existing psychological frameworks to LLMs without introducing fundamentally new methods, and its simulation claims face validity concerns regarding whether LLMs truly internalize values versus pattern-matching.
Paper 2 has higher estimated impact: it introduces a principled, first-of-its-kind approach to learning domain-dependent heuristics that are admissible by design, preserving optimality guarantees—an enduring, foundational contribution to classical planning. The method combines LLM-driven program synthesis with evolutionary search and integrates cleanly with established optimal planners via abstractions and saturated cost partitioning, suggesting methodological rigor and broad reuse. Its contributions generalize across planning domains and can influence automated reasoning, verification, and hybrid neuro-symbolic AI. Paper 1 is timely and useful but more application-specific to diffusion-model safety and may be superseded quickly.
Paper 1 has higher potential impact due to a more novel methodological contribution: the first admissible-by-design learned domain-dependent heuristics for optimal classical planning, combining LLM-driven program synthesis with evolutionary search and saturated cost partitioning. This advances a core algorithmic guarantee (optimality) and can influence planning, heuristic search, and neurosymbolic methods broadly. Paper 2 provides a valuable benchmark, but benchmarks tend to have impact mainly within the evaluation ecosystem and may age faster as simulators/tasks become superseded. Paper 1’s technique is more general and foundational.
Paper 2 is more novel and potentially higher impact: it introduces the first learned, domain-dependent heuristics that are admissible by construction for optimal planning, combining LLM-driven program synthesis with established admissible composition (saturated cost partitioning). This bridges modern LLM synthesis with classical guarantees, enabling real-world deployment where optimality matters (robotics, logistics, verification) while improving speed and maintaining coverage. Paper 1 is valuable and timely as an evaluation framework for continual learning in agents, but benchmarks/probing tools typically have narrower downstream impact than a new guarantee-preserving method that can directly improve core planning performance.
Paper 1 introduces a fundamentally novel approach—using LLM-driven evolutionary program synthesis to learn admissible heuristics for optimal classical planning—bridging AI planning and LLMs in a way not previously done. It guarantees optimality (admissibility by design), offers interpretable insights, and matches state-of-the-art coverage with faster evaluation. Paper 2, while useful, addresses a more incremental problem (diversifying LLM idea generation) with narrower methodological novelty and impact scope. Paper 1's contribution to automated planning theory and practice gives it substantially broader and deeper scientific significance.
Paper 2 addresses a critical and widely relevant problem in AI today: improving the reliability and multi-step reasoning capabilities of Large Language Models. By combining neural inference with symbolic computation and dynamic memory, its approach has broad applicability across natural language processing and general AI problem-solving. In contrast, Paper 1 offers strong methodological rigor and important guarantees, but its focus on optimal classical planning is more niche, resulting in a narrower potential impact.
Paper 2 addresses a highly timely and critical issue in the widespread deployment of Large Language Models: the impact of compression on uncertainty estimation. Because LLM compression is ubiquitous for practical deployment, evaluating its effect on safety-critical uncertainty metrics gives this paper a broader and more immediate real-world impact across AI domains compared to Paper 1, which focuses on the relatively narrower field of classical planning.
Paper 2 likely has higher scientific impact due to strong real-world applicability (subject-independent EEG decoding is a key bottleneck for practical BCIs), broad cross-field relevance (neuroscience, ML, multimodal learning, healthcare), and timeliness (multimodal priors, distillation). It proposes a generally reusable paradigm—video-derived semantic/motor priors with no inference overhead—and reports clear gains on public datasets. Paper 1 is novel and rigorous for optimal planning, but is more niche and domain-dependent, limiting breadth and immediate application compared to BCI advances.