Fast and accurate AI-based pre-decoders for surface codes

Christopher Chamberland, Jan Olle, Muyuan Li, Scott Thornton, Igor Baratta

Apr 14, 2026

arXiv:2604.12841v1 PDF

quant-ph(primary)

#29of 2593·Quantum Physics

#29 of 2593 · Quantum Physics

Tournament Score

1590±39

10501750

75%

Win Rate

Wins

Losses

Matches

Rating

7.5/ 10

Significance8

Rigor7.5

Novelty6.5

Clarity6.5

Tournament Score

1590±39

10501750

75%

Win Rate

Wins

Losses

Matches

Rating

7.5/ 10

Significance

Rigor

Novelty

Clarity

Abstract

Fast, scalable decoding architectures that operate in a block-wise parallel fashion across space and time are essential for real-time fault-tolerant quantum computing. We introduce a scalable AI-based pre-decoder for the surface code that performs local, parallel error correction with low decoding runtimes, removing the majority of physical errors before passing residual syndromes to a downstream global decoder. This modular architecture is backend-agnostic and composes with arbitrary global decoding algorithms designed for surface codes, and our implementation is completely open source. Integrated with uncorrelated PyMatching, the pipeline achieves end-to-end decoding runtimes of order $\mathcal{O}(1 μ\text{s})$ per round at large code distances on NVIDIA GB300 GPUs while reducing logical error rates (LERs) relative to global decoding alone. In a block-wise parallel decoding scheme with access to multiple GPUs, the decoding runtime can be reduced to well below $\mathcal{O}(1 μ\text{s})$ per round. We observe further LER improvements by training a larger model, outperforming correlated PyMatching up to distance-13. We additionally introduce a noise-learning architecture that infers decoding weights directly from experimentally accessible syndrome statistics without requiring an explicit circuit-level noise model. We show that purely data-driven graph weight estimation can nearly match uncorrelated PyMatching and exceed correlated PyMatching in certain regimes, enabling highly-optimized decoding when hardware noise models are unknown or time-varying, as well as training pre-decoders with realistic noise models. Together, these results establish a practical, modular, and high-throughput decoding framework suitable for large-distance surface-code implementations.

AI Impact Assessments

(3 models)

Scientific Impact Assessment

1. Core Contribution

This paper introduces a modular AI-based pre-decoder architecture for the rotated surface code that performs local spacelike and timelike corrections via 3D convolutional neural networks (CNNs), reducing syndrome density before a global decoder (PyMatching) performs final corrections. The key claim—and what distinguishes this work from prior pre-decoder efforts [9, 22-24]—is the simultaneous achievement of both lower logical error rates (LERs) and reduced end-to-end decoding runtimes relative to the global decoder alone. This is the first demonstration of an AI-based pre-decoder achieving both metrics simultaneously.

The paper additionally introduces a noise-learning architecture that infers edge/hyperedge weights for PyMatching directly from syndrome statistics, without requiring an explicit circuit-level noise model. This addresses the practical scenario where hardware noise is unknown or drifting.

2. Methodological Rigor

The paper demonstrates strong methodological depth across several dimensions:

Data processing innovations. Three new algorithms are introduced: (1) Algorithm 1 for isolating timelike failure components, (2) Algorithm 2 for preventing artificial timelike detection events through fault deferral, and (3) Algorithm 3 for timelike homological equivalence. These are well-motivated by the physics of syndrome extraction circuits and represent genuine improvements in training label quality. The Y-error decomposition rules (Table I) and the complete homological equivalence pipeline (Figure 11) show careful attention to subtle failure mechanisms.

Systematic architecture exploration. Five pre-decoder models spanning width, depth, and kernel size axes (Table II) are benchmarked, enabling clear analysis of architectural tradeoffs. The receptive field analysis (Eq. 8) provides principled guidance for architecture selection.

Hardware benchmarking. Runtime measurements use proper methodology—CUDA graph capture, disabled host-device transfers, spin-wait synchronization, warmup iterations—on NVIDIA GB300 GPUs with FP8 precision. This is the correct approach for measuring inference latency.

Noise learning. The identification of 18 distance-independent edge types and 43 hyperedge type compositions, with fully differentiable probability formulas, is a technically impressive contribution that enables the noise-learning model to generalize across code distances.

However, some limitations in rigor are notable. The LER comparisons at lower physical error rates (p=0.003) show some regimes where the pre-decoder + PyMatching underperforms PyMatching alone, particularly for model 1 at larger distances (Table V). The paper acknowledges this but attributes it to training distribution rather than providing a fix. The model 6 results for correlated matching show degradation at d≥17, which is a significant limitation given that large code distances are the primary motivation.

3. Potential Impact

Real-time decoding for FTQC. The O(1μs) per-round decoding runtime at large code distances is a critical milestone. With syndrome measurement times on the order of 1μs for superconducting platforms, meeting this budget is essential to avoid exponential backlogs. The demonstrated speedups of up to 3.4× over uncorrelated PyMatching and 3.5× over correlated PyMatching at d=31 are practically significant.

Scalability to lattice surgery. The architecture's natural compatibility with spatial and temporal parallelism makes it relevant for lattice surgery operations where merged patches can have effective distances d_eff >> 100. The batching analysis (Section VII, Table XIII) showing up to 12.5× reduction in parallel resources is particularly relevant for this regime.

Noise-agnostic decoding. The noise-learning architecture addresses a genuine practical need—real quantum hardware rarely matches idealized noise models. The ability to match or exceed correlated PyMatching performance using only syndrome statistics (Figure 20a) has immediate practical value for experimental groups.

Open source. The complete implementation is publicly available, which will facilitate adoption and reproducibility.

4. Timeliness & Relevance

This work is highly timely. The field is actively transitioning from proof-of-concept QEC demonstrations to scalable implementations. Google's recent work [16, 17] on neural decoders, combined with increasing experimental code distances, creates urgent demand for decoders that are simultaneously fast and accurate. The focus on GPU deployment reflects the likely classical computing architecture for FTQC control systems. The NVIDIA authorship and GB300 hardware access position this work at the intersection of quantum computing and high-performance classical computing infrastructure.

5. Strengths & Limitations

Key Strengths:

First demonstration of simultaneous LER improvement and runtime reduction with an AI pre-decoder

Comprehensive data-processing pipeline (Algorithms 1-3) that substantially improves training quality

Backend-agnostic design composable with arbitrary global decoders

Distance-independent noise-learning formulas enabling single-distance training

Thorough runtime benchmarking with realistic GPU deployment

Open-source code and models

Notable Limitations:

Model 6 (needed to beat correlated matching) shows LER degradation at d≥17—the regime where pre-decoders are most needed

The noise-learning model does not improve LER when applied to pre-decoder outputs, suggesting the pre-decoder's residual error structure limits downstream optimization

All results use depolarizing noise; performance under realistic, biased, or spatially-correlated noise is untested

No experimental validation on real quantum hardware data

The paper does not address lattice surgery directly, despite repeatedly motivating the work with it

FP4 quantization, mentioned as future work, may be necessary for the tightest runtime budgets

The comparison landscape is incomplete—no comparison against other recent neural decoders [17, 18] or belief propagation approaches

6. Additional Observations

The paper is exceptionally long and detailed (36 pages with appendices), which aids reproducibility but somewhat obscures the main narrative. The edge weight formulas in the appendix, while valuable for implementation, could have been relegated to supplementary material. The argument in Section VII that modest LER degradation from ReLU activations is acceptable because α≈4.39 is needed to increase code distance is a useful practical insight.

Rating:7.5/ 10

Significance 8Rigor 7.5Novelty 6.5Clarity 6.5

Generated Apr 15, 2026

Comparison History (32)

vs. The Pinnacle Architecture: Reducing the cost of breaking RSA-2048 to 100 000 physical qubits using quantum LDPC codes

claude-opus-4.65/6/2026

The Pinnacle Architecture represents a potentially transformative breakthrough by demonstrating RSA-2048 factoring with ~100,000 physical qubits—an order of magnitude reduction over prior estimates. This has enormous implications for cryptography, national security, and the timeline for practical quantum computing. It advances QLDPC code architectures in a fundamental way. Paper 2, while valuable for practical decoding speedups, is more incremental—improving surface code decoders with AI pre-processing. Paper 1's result reshapes our understanding of quantum computing resource requirements and has broader cross-disciplinary impact.

vs. Decoherence in Waveguide Quantum Electrodynamics using Matrix Product States

claude-opus-4.64/21/2026

Paper 1 addresses a critical bottleneck in fault-tolerant quantum computing—real-time decoding of surface codes—with a practical, scalable AI-based solution achieving sub-microsecond decoding times. Its modular, open-source framework with noise-learning capabilities has immediate applicability to experimental quantum computing hardware. The breadth of impact is larger as it directly enables scaling of quantum computers. Paper 2, while methodologically sound, extends existing MPS methods to a more specialized domain (waveguide QED with decoherence), with narrower immediate impact and fewer transformative real-world applications.

vs. Magnetic domains stabilized by symmetry-protected zero modes

claude-opus-4.64/20/2026

Paper 2 addresses a critical practical bottleneck in fault-tolerant quantum computing—real-time decoding of surface codes—with a scalable, open-source AI-based framework achieving microsecond-level decoding times. Its immediate applicability to experimental quantum computing hardware, modular design composable with existing decoders, and novel noise-learning architecture give it broader near-term impact. While Paper 1 presents interesting theoretical findings on non-ergodic dynamics via symmetry-protected zero modes, Paper 2's direct relevance to the rapidly advancing field of quantum error correction and its practical engineering contributions position it for higher citation impact and broader influence.

vs. Towards Ultra-High-Rate Quantum Error Correction with Reconfigurable Atom Arrays

gemini-34/20/2026

While Paper 1 provides an excellent, practical solution to the surface code decoding bottleneck, Paper 2 tackles the more fundamental challenge of massive qubit overhead in quantum error correction. By demonstrating practical ultra-high-rate qLDPC codes (>1/2 encoding rate) co-designed for neutral atom arrays, Paper 2 represents a potential paradigm shift away from surface codes, offering a path to large-scale fault-tolerant quantum computing with significantly fewer physical resources.

vs. Integrable, Mixed, and Chaotic Dynamics in a Single All-to-All Ising Spin Model

claude-opus-4.64/17/2026

Paper 1 addresses a critical bottleneck in fault-tolerant quantum computing—real-time decoding of surface codes—with a practical, scalable AI-based solution achieving microsecond-level decoding times on modern GPUs. It offers immediate real-world applicability to quantum error correction hardware, introduces a noise-learning architecture removing the need for explicit noise models, and is open source. Paper 2 presents interesting theoretical findings about dynamics in the Ising ATA model but has narrower impact, primarily advancing quantum chaos theory without comparable practical applications or breadth of influence.

vs. Ultrafast all-optical quantum teleportation

gpt-5.24/17/2026

Paper 2 likely has higher impact: it demonstrates a major experimental breakthrough—1‑THz all-optical quantum teleportation—removing a fundamental electronics bottleneck and enabling ultrafast quantum processing and telecom-compatible networking. The result is highly novel, timely, and broadly relevant across quantum optics, communications, and photonic computing, with clear real-world implications. Paper 1 is strong and practical (open-source, scalable surface-code decoding, data-driven noise learning), but is more incremental/engineering-oriented within fault-tolerant QEC, whereas Paper 2 sets a new capability frontier with wider cross-field resonance.

vs. Quantum chaos and the holographic principle

gpt-5.24/15/2026

Paper 1 has higher potential impact due to a novel, modular AI pre-decoder plus data-driven noise-weight learning that directly targets a central bottleneck for scalable fault-tolerant quantum computing: real-time, large-distance surface-code decoding at microsecond latencies. It offers clear real-world applicability, open-source implementation, and measurable performance improvements (runtime and logical error rates) with compatibility across global decoders. Paper 2 is a review (likely high pedagogical value) but is less methodologically innovative and its direct practical applications are more indirect, limiting near-term scientific/technological impact.

vs. Information-Theoretic Scaling Laws of Neural Quantum States

claude-opus-4.64/15/2026

Paper 2 addresses a critical practical bottleneck in fault-tolerant quantum computing—real-time decoding speed—with a concrete, open-source, hardware-deployable solution achieving O(1μs) decoding on current GPUs. Its modularity, backend-agnosticism, noise-learning capability, and demonstrated performance improvements over established decoders give it broad near-term impact as quantum hardware scales. Paper 1 provides elegant theoretical scaling laws for neural quantum states, but its impact is more niche and foundational. Paper 2's direct relevance to experimental quantum error correction and practical fault tolerance gives it higher potential impact.

vs. 1-Mbps Twin-Field Quantum Key Distribution over 200 km Using Independent Dissipative Kerr Solitons

gpt-5.24/15/2026

Paper 2 likely has higher impact due to broader applicability and timeliness for near-term fault-tolerant quantum computing: real-time surface-code decoding at ~microsecond latency, modular integration with existing decoders, open-source implementation, and data-driven noise learning without explicit hardware noise models. These advances address a key scaling bottleneck across many quantum hardware platforms and can influence both theory and systems engineering. Paper 1 is highly innovative and experimentally rigorous for quantum communications, but its impact is narrower (TF-QKD/WDM over fiber) and more domain-specific.

vs. Weak distillation of quantum resources

gpt-5.24/15/2026

Paper 2 likely has higher scientific impact due to direct, timely relevance to near-term fault-tolerant quantum computing: it targets real-time surface-code decoding with scalable, block-parallel architectures, reports concrete runtime and logical-error-rate gains, provides an open-source implementation, and introduces data-driven noise-learning useful when noise models are unknown or drifting. This combination of practical deployability, hardware-facing metrics, and broad usability across decoding backends suggests faster translation to experiments and industry. Paper 1 is conceptually novel and broadly relevant, but may have less immediate, system-level impact.

vs. Entanglement concentration via measurement:- role of imaginarity

claude-opus-4.64/15/2026

Paper 2 addresses a critical bottleneck in fault-tolerant quantum computing—real-time decoding of surface codes—with a practical, scalable AI-based solution achieving microsecond-level runtimes. Its open-source, modular architecture applicable to arbitrary backends, noise-learning capability without explicit noise models, and demonstrated performance improvements at large code distances make it highly impactful for the rapidly growing quantum error correction community. Paper 1, while interesting in formalizing imaginarity's role in entanglement protocols, addresses a more niche theoretical question with narrower practical implications.

vs. 2D quantum-path interference in high-harmonic generation driven by highly-bichromatic fields

gemini-34/15/2026

Paper 1 addresses a critical bottleneck in fault-tolerant quantum computing—scalable and fast error correction. By leveraging AI to achieve microsecond-level decoding times and introducing a data-driven noise-learning architecture, it offers highly practical, real-world applications that could significantly accelerate the development of functional quantum computers. Paper 2 presents valuable fundamental physics insights into high-harmonic generation spectroscopy, but its impact is narrower and largely confined to ultrafast atomic physics, making Paper 1 more broadly impactful and timely.

vs. Fault-Tolerant One-Shot Entanglement Generation with Constant-Sized Quantum Devices in the Plane

claude-opus-4.64/15/2026

Paper 1 presents a fundamental theoretical breakthrough: the first protocol for fault-tolerant one-shot entanglement generation over arbitrary distances using constant-sized quantum devices in 2D, with implications for quantum networks, quantum error correction, and condensed matter physics (Gibbs states with long-range localizable entanglement). This resolves open questions about locality constraints and entanglement generation. Paper 2, while practically valuable for near-term quantum computing decoding pipelines, is more incremental—improving performance of existing surface code decoders using AI pre-processing. Paper 1's conceptual novelty and breadth of impact across multiple fields gives it higher long-term scientific significance.

vs. Unlocking a fast adiabatic CZ gate and exact residual $ZZ$ cancellation between fixed-frequency transmons using a floating tunable coupler

gpt-5.24/15/2026

Paper 2 likely has higher scientific impact due to broader applicability and timeliness: scalable, real-time surface-code decoding is a key bottleneck for fault-tolerant quantum computing across hardware platforms. Its modular pre-decoder concept, microsecond-level runtimes at large code distances, open-source implementation, and data-driven noise-learning (not requiring explicit noise models) make it widely deployable and influential across quantum error correction, ML systems, and HPC. Paper 1 is highly valuable but more hardware-specific (fixed-frequency transmons with a floating tunable coupler) and its impact is narrower, despite strong experimental rigor and impressive 24 ns, >99.9% CZ performance.

vs. Granularity Noise Limit in Atomic-Ensemble-Based Metrology

gpt-5.24/15/2026

Paper 1 has higher likely impact due to strong timeliness for fault-tolerant quantum computing, a scalable modular architecture compatible with existing decoders, and concrete performance claims (µs/round on modern GPUs) plus open-source implementation—factors that accelerate adoption. Its noise-learning approach addresses practical hardware uncertainty, broadening real-world applicability. Paper 2 offers an elegant theoretical correction to common approximations in atomic-ensemble metrology with potentially important implications, but impact may be narrower and depend on experimental validation across platforms.

vs. Noise-enhanced quantum kernels on analog quantum computers

gpt-5.24/15/2026

Paper 1 has higher impact potential due to strong timeliness for fault-tolerant quantum computing, clear real-world applicability (real-time surface-code decoding), and a modular architecture that composes with existing global decoders. It reports concrete performance (≈O(1 μs)/round on modern GPUs), open-source implementation, and a data-driven noise-learning method relevant for unknown/time-varying hardware noise—broadly useful across experimental platforms. Paper 2 is novel in analog/noise-enhanced kernels and has interesting implications for QML, but its applications are narrower and near-term advantage claims are less directly tied to critical quantum-computing bottlenecks.

vs. Universal Robust Quantum Gates via Doubly Geometric Control

gemini-34/15/2026

While Paper 1 offers a significant theoretical advancement in geometric quantum control, Paper 2 addresses a critical and immediate bottleneck in fault-tolerant quantum computing: real-time syndrome decoding. By leveraging AI to achieve microsecond-level scalable pre-decoding on standard GPUs and introducing a data-driven noise-learning architecture, Paper 2 provides highly practical, open-source tools with immediate real-world applicability, giving it a broader and more immediate scientific and technological impact.

vs. Three Hamiltonians are Sufficient for Unitary $k$-Design in Temporal Ensemble

gemini-34/15/2026

Paper 1 addresses a critical bottleneck in realizing fault-tolerant quantum computing: real-time error correction. Its practical, highly scalable AI-based pre-decoder achieves microsecond decoding times and can learn from hardware noise, offering immediate, broad applications for experimental quantum scaling. Paper 2 presents a valuable theoretical advance in quantum information regarding unitary k-designs, but Paper 1's direct path to enabling large-scale quantum computers gives it a higher potential for widespread, transformative scientific and technological impact.

vs. Opportunistic QKD: Exploiting Idle Capacity of Classical WDM Systems

gpt-5.24/15/2026

Paper 1 has higher impact potential: it targets a core bottleneck for fault-tolerant quantum computing (real-time surface-code decoding) with a modular, backend-agnostic AI pre-decoder that composes with existing global decoders and includes data-driven noise/weight learning when models are unknown—high novelty and strong applicability. It reports concrete performance (≈O(1 µs)/round on modern GPUs), open-source implementation, and logical error-rate improvements, indicating methodological rigor and near-term relevance. Paper 2 is practical for QKD integration but is more incremental and simulation/modeling-heavy, with narrower cross-field impact.

vs. Minimal noise in non-quantized gravity

gemini-34/15/2026

Paper 2 addresses a critical, immediate bottleneck in fault-tolerant quantum computing (real-time decoding) with a practical, scalable AI-based solution. Its direct real-world applicability and potential to significantly accelerate the development of large-scale quantum computers give it broader and more immediate technological impact compared to Paper 1, which focuses on profound but specialized fundamental physics tests.