Revealing Algorithmic Deductive Circuits for Logical Reasoning
Phuong Minh Nguyen, Tien Huu Dang, Naoya Inoue
Abstract
Recent studies have shown that Large Language Models (LLMs) can achieve strong reasoning performance by incorporating functional symbolic representations that abstractly describe graph traversal algorithms and step-by-step reasoning in few-shot learning settings. However, it remains unclear how LLMs genuinely understand the abstract meaning of each reasoning step and the overall algorithm from only a limited number of demonstrations. This work aims to localize the attention heads responsible for individual reasoning steps and characterize the types of information transferred among them. We first align constituent reasoning steps with their corresponding token logits under a symbolic-aided Chain-of-Thought (CoT) prompting framework. Our analysis shows that token positions that steer the reasoning process are associated with low confidence scores caused by constraints on satisfying reasoning behavior patterns in demonstrations. We then adopt causal mediation analysis techniques to identify the attention heads responsible for these patterns. In addition, our findings indicate that LLMs retrieve factual and rule-based information for individual sub-reasoning tasks through specialized attention heads (approximately 3% total heads), whereas higher layers predominantly facilitate information integration and the emergence of global reasoning strategies (e.g., graph traversal algorithms) that coordinate multiple intermediate reasoning steps to solve the overall task.
AI Impact Assessments
(1 models)Scientific Impact Assessment
Core Contribution
This paper investigates the internal mechanisms by which LLMs perform multi-step deductive logical reasoning within Chain-of-Thought (CoT) prompting frameworks. The key contributions are: (1) identifying that "uncertain tokens" — those with low prediction confidence — correspond to critical reasoning decision points (premise selection, premise selection termination, rule selection); (2) using causal mediation analysis (activation patching and path patching) to localize the attention heads (~3% of total) responsible for these reasoning components; and (3) revealing a hierarchical "circuit network" architecture where early layers handle factual/rule retrieval, middle layers match rule conditions, and higher layers integrate information for global reasoning strategies like graph traversal algorithms.
The novelty lies in extending circuit analysis from simple input-output mappings to multi-step reasoning processes. Previous circuit discovery work (Hong et al., 2026; Kim et al., 2025; Stolfo et al., 2023) focused on single-step tasks, while this paper decomposes complex deductive reasoning into constituent components and traces information flow across them.
Methodological Rigor
The methodology follows established mechanistic interpretability practices (activation patching, path patching) but applies them in a more complex setting. The approach is systematic:
Strengths in methodology:
Concerns:
Potential Impact
For mechanistic interpretability: This work extends the frontier of circuit analysis to multi-step reasoning, which is a meaningful step. The concept of a "circuit network" — interconnected sub-circuits for different reasoning components — provides a useful framework for understanding complex behaviors.
For reasoning improvement: Understanding which attention heads are responsible for specific reasoning failures could inform targeted interventions (e.g., activation engineering, fine-tuning specific components). The finding that ~3% of heads are critical could guide model compression or editing strategies.
For prompt engineering: The insight that uncertain tokens at reasoning decision points are consistently challenging across models could inform better prompting strategies.
However, the practical impact may be limited because: (1) the symbolic reasoning format is not widely used in practice; (2) the analysis is restricted to relatively small models (4B-8B parameters); and (3) it's unclear how to leverage these findings for concrete improvements.
Timeliness & Relevance
The paper is timely, given the current intense interest in understanding and improving LLM reasoning capabilities, and the growing field of mechanistic interpretability. The intersection of circuit analysis with multi-step reasoning is an emerging research direction. The work on reasoning-o1/reasoning models makes understanding reasoning mechanisms increasingly relevant.
Strengths
1. Cross-model consistency: The finding that uncertain tokens and circuit structures are consistent across four different LLM families (Llama, Qwen, Phi) strengthens the generalizability claims considerably.
2. Comprehensive ablation: The knockout experiments across multiple datasets (synthesized, ProntoQA, ProofWriter, MMLU) with multiple ablation configurations provide strong evidence for the functional importance of discovered circuits.
3. Hierarchical temporal structure: The discovery that reasoning proceeds through a consistent computational pipeline (rule condition matching → traversal algorithm implementation → premise/rule selection → termination) across models is an interesting structural finding.
4. The /3Roles result: The dramatic collapse of reasoning ability when ablating all three reasoning component types simultaneously (near-zero inference accuracy) while individual ablations show more moderate effects is a compelling finding about circuit interdependence.
Limitations
1. Simplified setting: Using single-character premises removes crucial aspects of real-world reasoning (semantic understanding, ambiguity resolution, premise complexity).
2. Format dependence: The analysis is tightly coupled to the specific Symbolic-Aided CoT format. It remains unclear whether the same circuits would be identified under different prompting formats or natural language reasoning.
3. Scale limitation: All models are ≤14B parameters. Larger models may distribute reasoning differently across their architecture.
4. MLP exclusion: The decision to focus exclusively on attention heads, while justified by prior work, leaves a potentially significant component unexplored.
5. Polysemantic heads: The observation that many heads are polysemantic complicates the clean "circuit" narrative and suggests the actual mechanism may be more distributed than presented.
6. Limited comparison to prior work: There is no quantitative comparison with Hong et al. (2026) or Kim et al. (2025) on overlapping aspects of their analyses.
Overall Assessment
This is a solid mechanistic interpretability study that makes a meaningful contribution by extending circuit analysis to multi-step deductive reasoning. The cross-model validation and comprehensive ablation studies are strengths. However, the reliance on simplified synthetic data and a specific symbolic prompting format limits the generalizability of conclusions. The work represents an incremental but useful advance in understanding LLM reasoning mechanisms.
Generated May 28, 2026
Comparison History (20)
Paper 1 introduces a novel paradigm (meta-evolving) for agentic AI systems with practical implications for continual learning and skill adaptation at test time. Its hierarchical approach to jointly optimizing skills and evolving strategies addresses a timely need in deployed LLM-based agents. Paper 2 provides valuable mechanistic interpretability insights into LLM reasoning circuits, but its contributions are more analytical/descriptive rather than enabling new capabilities. Paper 1's broader applicability across agentic benchmarks and its practical framework for improving agent systems gives it higher potential impact in the rapidly growing field of AI agents.
Paper 1 addresses a more practical and broadly impactful problem—multimodal reasoning with a novel cognitive scheduling framework that determines when to invoke visual perception during reasoning. This has immediate applications across vision-language tasks and introduces an architecturally novel paradigm beyond existing approaches. Paper 2, while offering valuable mechanistic interpretability insights into LLM reasoning circuits, is more analytical/explanatory in nature with narrower scope (logical reasoning only). Paper 1's framework-level contribution with demonstrated benchmark improvements has greater potential to influence future system design across the multimodal AI community.
Paper 2 has higher estimated impact due to broader real-world applicability and timeliness: it proposes a general, testable framework for provenance/lineage tracing of synthetic content via steganographic inheritance, with theoretical characterization and empirical validation across methods and perturbations. This directly addresses urgent problems in AI governance (attribution, accountability, misinformation) and can influence multiple fields (security, watermarking, forensics, policy). Paper 1 is novel and rigorous for mechanistic interpretability of LLM reasoning, but its impact is narrower (analysis of attention heads under specific prompting) and may translate less directly into deployable systems.
Paper 1 addresses a fundamental question about how LLMs perform reasoning internally, using causal mediation analysis to localize and characterize attention heads responsible for deductive reasoning steps. This mechanistic interpretability work has broad implications across AI safety, model understanding, and reasoning research. Paper 2 presents an incremental improvement in building energy management by incorporating PMV-based comfort metrics into RL reward shaping—a useful but narrower engineering contribution with limited novelty beyond applying an existing standard (ISO 7730) to an existing RL framework (SAC), and the results show modest improvements at best.
Paper 2 investigates the mechanistic interpretability of LLMs, uncovering how specific attention heads coordinate logical reasoning and algorithmic steps. This fundamental architectural understanding has broad, long-term implications for AI alignment, safety, and future model design. While Paper 1 provides a valuable methodological correction and statistical critique of a specific benchmark, its scope is inherently narrower and primarily impacts benchmarking practices rather than uncovering generalizable mechanisms of artificial intelligence.
Paper 1 addresses a highly practical and widespread bottleneck—the computational cost of using reasoning LLMs for evaluation. Its proposed routing framework offers a theoretically rigorous, cost-saving solution with immediate real-world applications in AI development pipelines. While Paper 2 provides valuable fundamental insights into LLM interpretability, Paper 1's combination of actionable methodology, broad applicability across AI evaluation, and strong theoretical guarantees gives it higher potential for immediate and widespread scientific impact.
Paper 1 addresses a fundamental question about how LLMs perform reasoning internally, using mechanistic interpretability (causal mediation analysis) to localize and characterize attention heads responsible for deductive reasoning steps. This has broad impact across AI interpretability, reasoning, and alignment research. Paper 2 presents an applied pipeline for tagging learning resources with competencies—useful but narrower in scope, addressing an educational technology problem with incremental methodological contributions. Paper 1's insights into LLM reasoning mechanisms are more novel and relevant to a wider research community.
Paper 1 focuses on mechanistic interpretability of LLMs, addressing fundamental questions about how they perform logical reasoning. This contributes deeply to the theoretical understanding of AI systems, with broad implications for AI safety, alignment, and architecture design. Paper 2 presents a practical and innovative tool for 3D asset generation, but its impact is largely confined to computer graphics and game development. Therefore, Paper 1 has higher potential for broad, foundational scientific impact.
Paper 1 likely has higher scientific impact due to greater novelty and broader relevance: it develops a mechanistic, causally grounded analysis (token-logit alignment + causal mediation) to localize and characterize attention heads underpinning multi-step logical reasoning, informing interpretability, alignment, and LLM reasoning research across many tasks and models. Paper 2 is timely and application-ready for industrial planning, but is more domain-specific and primarily an engineering integration of LLM interfaces with an existing SMT planner, with limited methodological innovation and small-scale evaluation.
Paper 1 addresses a critical bottleneck in the highly relevant field of LLM alignment and post-training via reinforcement learning (RLVR/GRPO). By introducing a dynamic, policy-aware rubric reward framework, it offers immediate, practical improvements in training efficiency and model performance. While Paper 2 provides valuable theoretical insights into mechanistic interpretability, Paper 1's algorithmic contribution has broader and more immediate real-world applications in developing advanced AI systems, granting it a higher potential for widespread scientific and industrial impact.
Paper 1 presents a novel and practical paradigm (TaC) that repurposes reasoning models as context compressors, achieving substantial quantitative improvements (17-23% gains) over strong baselines. It addresses the highly relevant problem of long-context efficiency with a surprisingly simple yet effective approach, has immediate practical applications for LLM deployment, and reveals an unexpected connection between reasoning and compression. Paper 2 provides valuable mechanistic interpretability insights into reasoning circuits but has narrower applicability and less immediate practical impact, primarily confirming and extending existing understanding of attention head specialization.
Paper 1 addresses a foundational challenge in modern AI: mechanistic interpretability of Large Language Models. By uncovering the specific attention heads and deductive circuits responsible for logical reasoning and algorithmic execution, this work provides critical insights into the 'black box' of LLMs. Its methodological rigor using causal mediation analysis advances the field of AI safety and model optimization. While Paper 2 offers strong applied value in computational biology, Paper 1 has a broader potential impact, as understanding and improving LLM reasoning fundamentally affects all downstream applications of AI, including scientific discovery itself.
Paper 2 is likely higher impact because it proposes a concrete, broadly applicable evaluation standard addressing a timely, field-wide measurement crisis in LLM-as-a-judge and multi-hop RAG. Its fixed-budget, cluster-aware, pre-registered, replicated protocol strengthens methodological rigor and can change empirical conclusions, affecting many future papers and benchmarks across domains. Paper 1 offers valuable mechanistic insight into attention heads and reasoning, but its impact may be narrower (analysis-specific, model- and prompting-dependent) and less immediately actionable for improving or standardizing real-world systems.
Paper 1 offers deeper mechanistic insights into how LLMs perform reasoning, identifying specific attention heads responsible for deductive reasoning steps through causal mediation analysis. This contributes fundamental understanding of LLM internals with broad implications for interpretability and model improvement. Paper 2 introduces a useful benchmark for evaluating harness effects on agent workflows, but benchmarks tend to have more incremental impact and shorter relevance windows. Paper 1's contributions to mechanistic interpretability address a more foundational question with broader applicability across the field.
Paper 1 provides a unifying taxonomy bridging NLP and Automated Planning communities for Tree-of-Thoughts reasoning, offering actionable design patterns and a research agenda. Its broad scope—systematizing a rapidly growing field and inviting cross-community collaboration—gives it higher potential for widespread adoption and citation. Paper 2, while offering valuable mechanistic interpretability insights into LLM reasoning circuits, addresses a narrower question about attention head localization. Paper 1's framework-level contribution is more likely to shape future research directions across multiple communities.
Paper 2 addresses a fundamental black-box problem in AI by investigating the mechanistic interpretability of logical reasoning in LLMs. By identifying specific deductive circuits and attention heads, it offers broad theoretical implications for AI safety, transparency, and future architecture design. While Paper 1 provides a valuable, practical training paradigm for search agents, Paper 2's insights into how reasoning capabilities structurally emerge within models have a deeper and potentially more foundational scientific impact across the broader AI research community.
Paper 1 offers higher potential impact due to its direct, scalable solution to a major bottleneck in LLM training: reliance on stronger teacher models or expensive data curation. By enabling models to bootstrap reasoning via recovery from their own noisy outputs, DenoiseRL provides a highly practical framework for self-improvement. While Paper 2 presents valuable mechanistic insights into LLM reasoning circuits, Paper 1's methodology directly advances the frontier of scalable, autonomous model capability, promising broader and more immediate real-world applications across the AI industry.
Paper 1 addresses a fundamental question about how LLMs perform reasoning internally, using mechanistic interpretability techniques to localize and characterize reasoning circuits. This has broad implications across AI safety, interpretability, and understanding of emergent capabilities in LLMs—topics of high current interest. Paper 2, while technically solid with a novel architecture for embodied agents in Minecraft, addresses a narrower application domain. Paper 1's insights into attention head specialization for reasoning steps and the emergence of algorithmic strategies in higher layers provide foundational knowledge applicable across many LLM applications and research directions.
Paper 1 offers deeper scientific novelty by mechanistically analyzing how LLMs perform reasoning through attention head localization and causal mediation analysis. It advances fundamental understanding of LLM internals with rigorous methodology. Paper 2 describes an engineering platform (Agyn) for deploying AI agents—while practically useful, it is primarily a systems/infrastructure contribution with limited scientific novelty. Paper 1's insights into reasoning circuits have broader implications for interpretability research, model design, and the broader AI/ML community.
Paper 2 provides mechanistic interpretability insights into how LLMs perform reasoning at the attention-head level, revealing specialized circuits for deductive reasoning. This addresses a fundamental question about LLM understanding and has broader implications for AI interpretability, model design, and trustworthy AI. Paper 1 offers useful empirical observations about backtracking patterns in reasoning traces with practical filtering applications, but its contributions are more incremental and narrowly focused on a diagnostic signal for one specific phenomenon. Paper 2's mechanistic approach has greater potential to influence multiple research directions in interpretability and reasoning.