Accuracy-Cost Trade-offs for Reference VQE Calculations of H $_2$ on IBM Quantum Hardware

Julen Larrucea, Marita Oliv, Jeanette Lorenz

Apr 13, 2026

arXiv:2604.11478v1 PDF

quant-ph(primary)

#2494of 2593·Quantum Physics

#2494 of 2593 · Quantum Physics

Tournament Score

1238±36

10501750

22%

Win Rate

Wins

Losses

Matches

Rating

3.5/ 10

Significance3

Rigor4

Novelty2.5

Clarity7

Tournament Score

1238±36

10501750

22%

Win Rate

Wins

Losses

Matches

Rating

3.5/ 10

Significance

Rigor

Novelty

Clarity

Abstract

We present a hardware-validated reference dataset for variational ground-state energy calculations of the hydrogen molecule H $_{2}$ on several IBM Quantum processors available in 2026. Using a standardized workflow, we benchmark the impact of shot count, backend choice, optimization strategy, and runtime variability on the achievable energy accuracy relative to exact diagonalization. The resulting dataset and analysis provide a transparent baseline for assessing the current capabilities and limitations of IBM Quantum hardware for quantum-chemistry applications, and are meant to ease the entry for new users by providing a comprehensive overview of choices and their effects as well as runtime efforts and costs that can be expected. Across the configurations studied here, circuit simplification through tapered mappings provides the most consistent accuracy gains, resilience level 1 improves accuracy at a substantial cost premium, and session-based execution yields no systematic accuracy advantage over single-job execution despite markedly higher billed time.

AI Impact Assessments

(3 models)

Scientific Impact Assessment

Core Contribution

This paper provides a systematic, hardware-validated benchmarking dataset for Variational Quantum Eigensolver (VQE) calculations of the hydrogen molecule (H₂) across multiple IBM Quantum processors available in 2025-2026. The core contribution is not algorithmic innovation but rather a practical reference guide that quantifies how standard workflow parameters—shot count, backend choice, fermion-to-qubit mapping, resilience levels, and execution mode (session vs. single-job)—affect achievable energy accuracy, runtime, and billed cost. The paper explicitly targets new or non-specialist users, positioning itself as a tutorial-aligned empirical resource.

The main findings are: (1) circuit simplification through symmetry-tapered mappings (PT) provides the most consistent accuracy gains; (2) resilience level 1 improves accuracy but at substantial cost; (3) session-based execution yields no systematic accuracy advantage over single-job execution despite markedly higher billing; (4) increasing shots beyond ~1024 provides diminishing returns; and (5) backend variability is a significant source of performance differences.

Methodological Rigor

The methodology is straightforward and appropriate for its stated purpose. The authors use a standardized Qiskit Nature workflow with minimal customization, which is deliberate—they aim to capture the "out-of-the-box" experience. The use of exact diagonalization as a reference and the consistent reporting of energy deviations (Eerr) is sound.

However, there are notable limitations in rigor. The repeat counts are unbalanced across configurations—some settings have only 1-2 data points while others have up to 10. The authors acknowledge this but it significantly weakens statistical conclusions, particularly for resilience level comparisons where some backends have single measurements. The claim that resilience level 1 "improves accuracy" for all backends is difficult to substantiate robustly when some backends have n=1 for resilience level 1. The percentage improvements reported in Figure 9 (e.g., 72%, 93%) are based on comparing means of very small, unequal samples.

The study design captures both temporal calibration drift and spatial qubit heterogeneity simultaneously without attempting to deconvolve these effects, which limits the interpretive depth. The paper also does not fix transpiled circuits across runs, which adds variability but makes it harder to isolate individual noise sources.

Potential Impact

The paper serves primarily as an educational and practical resource rather than a scientific advancement. Its value lies in:

1. Lowering barriers to entry: New quantum computing users in chemistry can use this as a calibrated expectations guide before committing resources.

2. Cost awareness: The explicit tracking and reporting of billed time vs. quantum time across execution modes fills a gap in the literature, where cost is rarely discussed.

3. Challenging common assumptions: The finding that session mode provides no accuracy advantage despite higher cost, and that more complex error mitigation doesn't always help, counters some prevailing recommendations.

The impact is limited by the extremely simple test case (H₂ in STO-3G basis). While this is justified for isolating workflow effects, the generalizability to even modestly larger systems (e.g., LiH, H₂O) remains uncertain. The authors acknowledge this limitation but don't provide any bridging experiments.

The dataset itself could have lasting value if made publicly available in a structured format, though the paper doesn't explicitly describe a data repository or provide access details.

Timeliness & Relevance

The work is timely in the sense that many groups and newcomers are actively exploring quantum chemistry on NISQ hardware, and practical guidance is genuinely needed. The specific hardware tested (Heron r1/r2/r3, Eagle r3, Nighthawk r1) represents the current IBM fleet, making results immediately relevant to active users.

However, the rapid evolution of quantum hardware means these specific numerical results will become outdated relatively quickly. The methodology is more durable than the specific findings. Additionally, the quantum computing community is increasingly moving beyond VQE toward more scalable approaches (e.g., quantum phase estimation with error correction), which somewhat reduces the long-term relevance.

Strengths

Practical orientation: Explicitly designed to help real users make informed decisions, with cost tracking that is rarely reported in academic papers.

Multi-dimensional comparison: Systematic variation across backends, shots, mappings, resilience levels, and execution modes provides a comprehensive picture.

Transparency: The use of default, tutorial-aligned workflows without hand-optimization makes results reproducible and representative of typical user experience.

Clear presentation: The paper is well-organized with informative figures that effectively convey the multi-factor comparison.

Limitations

Trivially small benchmark: H₂ in minimal basis is a single-parameter optimization problem for the PT mapping, making it unclear how findings scale.

Unbalanced sampling: Uneven repeat counts (n=1 to n=10) across configurations weaken statistical conclusions.

Limited novelty: The individual findings (smaller circuits work better on noisy hardware, more shots have diminishing returns) are largely expected and have been observed in prior work. The contribution is primarily in systematizing and quantifying these known effects.

No dataset release details: Despite being framed as a "reference dataset," the paper doesn't describe formal data availability.

Narrow scope: Only IBM hardware, only COBYLA (with brief SPSA comparison), only UCC ansatz, only one molecule at one geometry.

Temporal specificity: Results are tightly coupled to hardware calibrations during Aug 2025–Apr 2026 and may not transfer to future hardware revisions.

Overall Assessment

This is a competent benchmarking study that fills a practical gap for new quantum computing users in chemistry. It is more of a technical report or application note than a research paper advancing scientific understanding. The findings, while useful, are largely confirmatory of expected behavior. The unbalanced experimental design and trivially small test case limit the strength of conclusions. Its primary value is pedagogical and as a cost-awareness resource for the quantum computing community.

Rating:3.5/ 10

Significance 3Rigor 4Novelty 2.5Clarity 7

Generated Apr 19, 2026

Comparison History (36)

vs. Understanding oxide-thickness-dependent variability in dense Si-MOS quantum dot arrays

claude-opus-4.65/14/2026

Paper 1 addresses a critical scalability challenge for semiconductor quantum computing—uniformity in dense silicon quantum dot arrays—using industrially relevant 300mm CMOS processes with EUV lithography. The statistical characterization of 392 quantum dots across oxide thicknesses provides actionable design guidelines for scalable architectures. Paper 2 is a benchmarking study of VQE for H₂ on IBM hardware, which is incremental given extensive prior H₂ VQE studies. While useful as a reference dataset, it addresses a well-trodden problem with limited novelty. Paper 1's contributions to silicon qubit scalability have broader and more lasting impact.

vs. Perspective on tailoring quantum coherence with electron beams

gemini-3.15/12/2026

Paper 2 offers a forward-looking perspective on a novel fundamental technique (electron beams) to manipulate quantum coherence and entanglement, which has broad implications across condensed matter physics, materials science, and next-generation quantum computing hardware. In contrast, Paper 1 is a highly specific benchmarking study on current quantum hardware that, while practically useful now, is likely to become obsolete quickly as hardware and algorithms evolve.

vs. Rethinking How to Act: Action-Space Engineering for Reinforcement Learning-Based Circuit Routing in Distributed Quantum Systems

gemini-35/5/2026

Paper 2 addresses a critical scalability bottleneck in quantum computing—distributed quantum circuit routing—with a novel reinforcement learning approach yielding a significant 35% execution time reduction. This has broad implications for scaling quantum architectures. In contrast, Paper 1 is primarily a benchmarking study for a toy molecule (H2) on existing NISQ hardware. While Paper 1 provides practical value for current users, Paper 2 introduces methodological innovations that advance the long-term feasibility of large-scale distributed quantum computing, resulting in higher potential scientific impact.

vs. Rethinking How to Act: Action-Space Engineering for Reinforcement Learning-Based Circuit Routing in Distributed Quantum Systems

claude-opus-4.65/5/2026

Paper 2 addresses the forward-looking challenge of distributed quantum computing, proposing a novel RL-based approach with concrete performance improvements (up to 35% reduction in execution time). It tackles a scalability bottleneck that will become increasingly important as quantum systems grow. Paper 1, while valuable as a benchmarking reference for H₂ VQE on IBM hardware, is largely incremental—documenting known trade-offs on a well-studied toy problem. Its findings (e.g., tapering helps, error mitigation costs more) are relatively unsurprising. Paper 2 has broader methodological novelty and greater potential to influence the distributed quantum computing and compilation communities.

vs. Entanglement Distance of Two- and Multi-Qubit Variational States and Its Quantification with Quantum Computing

gemini-35/5/2026

Paper 1 offers a highly practical, hardware-validated benchmark dataset addressing the immediate challenge of accuracy-cost trade-offs in near-term quantum computing. Its standardized workflow and actionable insights on runtime, optimization, and costs will serve as an essential baseline for researchers across quantum chemistry and hardware evaluation, likely yielding broader immediate use and higher citation impact than the specific theoretical derivations in Paper 2.

vs. Entanglement Distance of Two- and Multi-Qubit Variational States and Its Quantification with Quantum Computing

gemini-35/5/2026

Paper 1 addresses the critical real-world challenge of cost-accuracy trade-offs in near-term quantum computing. By providing a hardware-validated reference dataset and practical benchmarks for VQE, it offers immediate utility to practitioners in quantum chemistry. While Paper 2 presents rigorous theoretical work on entanglement distance, Paper 1's focus on standardized baselines and practical hardware limitations makes it highly timely and likely to be widely cited as a reference standard in applied quantum algorithms.

vs. Sector-dominant graph-local drivers for path-window barrier Hamiltonians on the Boolean hypercube

claude-opus-4.64/29/2026

Paper 2 addresses a practical, widely relevant problem—benchmarking VQE on real quantum hardware for quantum chemistry—providing a transparent, reproducible reference dataset that serves the growing quantum computing community. It offers actionable insights on cost-accuracy tradeoffs (shot count, error mitigation, session management) on current IBM hardware, making it immediately useful for practitioners. Paper 1 is highly specialized, focusing on niche graph-local driver constructions for adiabatic state preparation on Boolean hypercubes with mixed/negative results and limited generalizability beyond finite-size cases. Paper 2's broader audience, practical utility, and timeliness give it higher impact potential.

vs. Universal Complex Quantum-Like Bits from Hermitian Weighted Graphs

gpt-5.24/28/2026

Paper 1 is more novel and broadly impactful: it develops a universality taxonomy for realizing arbitrary complex two-level states via Hermitian weighted graph couplings, with rigorous proofs, structural mechanisms (Hermitian conjugate pairing), and a discrete construction over {0,±1,±i} tied to perfect matchings. This can influence quantum walk/synchronization models, spectral graph theory, and quantum-information-inspired architectures. Paper 2 is timely and practically useful as a hardware benchmark dataset, but it is incremental (H2 VQE) and narrower in scientific reach, with impact mainly as a reference/engineering baseline rather than a new theoretical framework.

vs. Universal Complex Quantum-Like Bits from Hermitian Weighted Graphs

gemini-34/28/2026

Paper 2 presents a fundamental theoretical framework proving the universality of complex quantum-like bits from Hermitian weighted graphs. While Paper 1 provides a useful, timely empirical benchmark for near-term quantum hardware, such hardware-specific evaluations often become obsolete as technology rapidly advances. In contrast, Paper 2 establishes robust, mathematically rigorous structural mechanisms bridging graph theory and quantum state realization, offering deeper, longer-lasting scientific impact and broader potential across theoretical physics and quantum network design.

vs. Multidimensional semiclassical single- and double-quantum spectroscopy of anharmonic molecular polaritons

claude-opus-4.64/24/2026

Paper 2 presents a novel theoretical framework for computing multidimensional spectra of anharmonic molecular polaritons, solving a longstanding puzzle (polariton bleach effect) and opening new avenues for understanding cavity-enhanced molecular phenomena. It combines methodological innovation (semiclassical large-N approach with phase cycling) with broad applicability across polariton chemistry, nonlinear spectroscopy, and light-matter coupling. Paper 1, while useful as a benchmarking resource, is primarily an incremental hardware characterization of a simple well-studied system (H₂) on current quantum devices, with limited novelty and narrower scientific impact.

vs. Lagrange: Operating Italy's First Publicly-Accessible Quantum Computer for Research and Education

gemini-34/24/2026

Paper 1 provides a rigorous benchmarking dataset for VQE calculations, directly contributing to quantum chemistry and the evaluation of near-term quantum algorithms. It advances our understanding of algorithmic accuracy, error mitigation, and hardware trade-offs. In contrast, Paper 2 is primarily an IT infrastructure and software engineering report on managing user access to a quantum computer. While highly valuable for education and administration, Paper 1 offers greater fundamental scientific impact.

vs. Comment on "Quantum Limits to Incoherent Imaging are Achieved by Linear Interferometry"

gemini-34/23/2026

Paper 1 provides a practical benchmarking dataset and analysis for quantum chemistry on near-term quantum hardware, serving as a valuable resource and baseline for a wide community of researchers. In contrast, Paper 2 is a specific technical correction to a single theoretical paper, which, while important, has a much narrower scope and limited overall scientific impact compared to the empirical baseline provided by Paper 1.

vs. Comment on "Quantum Limits to Incoherent Imaging are Achieved by Linear Interferometry"

claude-opus-4.64/23/2026

Paper 2 corrects a flawed derivation in an influential quantum imaging paper, providing the correct optimal interferometric configuration for achieving quantum Fisher information limits. This has broader theoretical impact across quantum optics, imaging, and metrology. Paper 1, while useful as a practical benchmarking reference for VQE on IBM hardware, addresses a well-studied toy problem (H₂) with limited novelty and narrow applicability. The correction in Paper 2 affects foundational understanding in quantum-limited imaging, which has wider scientific ramifications.

vs. Nonuniversal beyond-LHY corrections to thermodynamic properties of a weakly interacting Bose gas

claude-opus-4.64/23/2026

Paper 1 addresses a fundamental theoretical question about beyond-LHY corrections in weakly interacting Bose gases, contributing new nonuniversal results to many-body quantum physics with broad relevance to ultracold atomic physics and condensed matter. Paper 2, while useful as a practical benchmarking study for quantum computing users, is primarily a hardware-specific reference dataset for a well-studied molecule (H₂) on 2026-era IBM hardware, making it more of a snapshot with limited long-term impact as hardware rapidly evolves. Paper 1's theoretical contributions have greater lasting significance.

vs. Jaynes-Cummings dynamics in strong coupling for many-interacting-qubit quantum Rabi models

gemini-34/23/2026

Paper 2 provides a highly practical, hardware-validated dataset and baseline for VQE calculations, which is immediately applicable to the fast-growing field of quantum chemistry on near-term hardware. Its empirical insights into accuracy-cost trade-offs offer broader near-term utility for algorithm developers and practitioners compared to the more specialized theoretical quantum dynamics explored in Paper 1.

vs. Continuous-time quantum-walk centrality for protein residue interaction networks

gpt-5.24/21/2026

Paper 2 is more novel and broadly applicable: it introduces a quantum-dynamical centrality measure for protein residue interaction networks, provides analytic grounding, evaluates on a sizable (~150 protein) dataset, and links results to known functional residues with a hardware proof-of-principle. This positions it at the intersection of quantum information, network science, and structural biology, with potential downstream impacts in protein function annotation and drug discovery. Paper 1 is valuable as a benchmarking/reference dataset but is narrower (H2 VQE on IBM devices) and primarily incremental, with limited cross-field reach.

vs. Long-term Performance Analysis of a Commercial QKD Device Under Real-world Deployment Conditions

gemini-34/20/2026

Paper 1 provides critical, long-term real-world data on quantum network infrastructure deployment, addressing practical challenges like thermal bottlenecks in tropical climates. This contributes directly to the scaling of quantum-safe networks. Paper 2, while useful for benchmarking, focuses on a highly studied toy problem (VQE of H2) and serves primarily as a baseline guide for hardware users rather than pushing the scientific boundaries of quantum computing.

vs. Strain-induced modification of spin-optical dynamics in silicon vacancy centers for integrated quantum technologies

claude-opus-4.64/20/2026

Paper 2 addresses a fundamental knowledge gap in silicon vacancy center physics that is critical for practical quantum device integration. It provides new physical insights into spin-strain dynamics using both experimental and first-principles methods, with broad implications for solid-state quantum technologies in CMOS-compatible materials. Paper 1, while useful as a benchmarking reference, is primarily an incremental engineering study on a toy problem (H₂) using specific hardware that will quickly become outdated, offering limited lasting scientific contribution beyond immediate practical guidance for IBM Quantum users.

vs. From GDSII to Wafer: EDA Design Flow and Data Conversion for Wafer-Scale Manufacturing of Superconducting Quantum Chips

gpt-5.24/19/2026

Paper 2 has higher potential impact due to its broader applicability and timeliness: wafer-scale superconducting chip manufacturing and Q-EDA infrastructure are key bottlenecks for scaling to large qubit counts. It proposes a systematic, end-to-end design-to-foundry data conversion pipeline (GDSII→PDK/DRC/LVS/DFM→MDP), quantum-specific DRC rules, tool benchmarking, and an architecture that can influence both academia and industry across quantum hardware, EDA, and manufacturing. Paper 1 is useful but narrower (H2 VQE benchmarking on IBM devices) and likely has limited cross-field reach and longer-term influence.

vs. Low-dose Image Recognition with Quantum Computational Electron Microscopy

claude-opus-4.64/19/2026

Paper 1 proposes a fundamentally novel concept—integrating quantum computation with electron microscopy via qudits for low-dose imaging of beam-sensitive specimens. This represents a genuinely new paradigm bridging quantum information science and electron microscopy, with broad potential impact across materials science, structural biology, and quantum computing. Paper 2, while useful as a practical benchmarking study of VQE on IBM hardware for H₂, is incremental—H₂ VQE is a well-studied problem, and the results are hardware-specific and time-limited. Paper 1's novelty and cross-disciplinary breadth give it substantially higher impact potential.

Accuracy-Cost Trade-offs for Reference VQE Calculations of H2_22​ on IBM Quantum Hardware

Abstract

AI Impact Assessments

Scientific Impact Assessment

Core Contribution

Methodological Rigor

Potential Impact

Timeliness & Relevance

Strengths

Limitations

Overall Assessment

Comparison History (36)

Accuracy-Cost Trade-offs for Reference VQE Calculations of H $_2$ on IBM Quantum Hardware