Technical Whitepaper

Surface Code Quantum Error Correction on IBM Heron R3 Backends

A comprehensive analysis of surface code QEC implementation, syndrome extraction protocols, and decoder selection across the EpochCore Qubit Distribution platform.

EpochCore Quantum Research Division
EpochCore, LLC — Austin, TX
Published: March 2026  |  Version 2.1
Table of Contents
  1. Abstract
  2. Introduction
  3. Surface Code Implementation
  4. Syndrome Extraction
  5. Decoder Selection
  6. Performance Results
  7. Conclusion
Abstract

We present a production-grade implementation of surface code quantum error correction deployed across six IBM Quantum Heron R3 backends totaling 913 physical qubits. Our system achieves logical error rates of 10-6 using distance-5 surface codes with three selectable decoder strategies: Steane code (optimized for latency), NVIDIA QLDPC (GPU-accelerated, high code distance), and Tensor Network (maximum fidelity). We demonstrate that real-time syndrome extraction with mid-circuit measurement enables continuous error monitoring without circuit interruption, achieving a 1000x improvement over raw physical error rates. The system processes up to 300 concurrent circuits with sub-10ms routing latency while maintaining 99.9% platform availability.

Section 1

Introduction

Quantum error correction (QEC) is the critical bridge between noisy intermediate-scale quantum (NISQ) devices and fault-tolerant quantum computing. Current IBM Quantum Heron R3 processors achieve two-qubit gate errors on the order of 2 x 10-3, which is sufficient to implement surface codes at small distances but requires careful engineering to extract practical benefit.

The EpochCore Qubit Distribution platform provides access to six Heron R3 backends with an aggregate capacity of 913 physical qubits. This paper describes the QEC layer that sits between user-submitted circuits and physical execution, automatically encoding logical qubits into surface code patches, performing syndrome extraction, and decoding errors in real time.

The heavy-hex connectivity topology of Heron R3 processors presents both challenges and opportunities for surface code implementation. While the topology is not a native match for the square lattice of standard surface codes, we demonstrate that a modified mapping strategy achieves equivalent protection with minimal overhead.

pL = A(pphys/pth)(d+1)/2
Logical error rate scaling with code distance d and physical error rate pphys
Section 2

Surface Code Implementation

Our surface code implementation uses the rotated surface code layout, which reduces the qubit overhead by approximately 50% compared to the standard surface code while maintaining equivalent distance. For a distance-d code, we require d2 data qubits and (d2-1) ancilla qubits for syndrome measurement.

2.1 Distance-3 Configuration

The distance-3 code uses 9 data qubits and 8 ancilla qubits (17 total) to encode a single logical qubit. This is the default configuration for latency-sensitive workloads, providing single-error correction with minimal qubit overhead. At physical error rates of 2 x 10-3, this achieves logical error rates of approximately 10-4.

2.2 Distance-5 Configuration

The distance-5 code uses 25 data qubits and 24 ancilla qubits (49 total) per logical qubit. This configuration corrects up to two simultaneous errors and is the default for production workloads requiring high fidelity. With the same physical error rates, logical error rates reach 10-6.

Surface Code Patch Configuration
# EpochCore surface code configuration
from epochcore.qec import SurfaceCode, SyndromeExtractor

# Initialize distance-5 surface code
code = SurfaceCode(
    distance=5,
    data_qubits=25,
    ancilla_qubits=24,
    topology="heavy-hex",
    backend="ibm_fez"
)

# Map logical circuit to physical qubits
physical_circuit = code.encode(logical_circuit)

# Configure syndrome extraction rounds
extractor = SyndromeExtractor(
    code=code,
    rounds=3,
    mid_circuit_reset=True,
    feedback_latency_us=0.4
)

# Execute with QEC
result = code.execute(
    physical_circuit,
    shots=4096,
    decoder="steane"
)
Section 3

Syndrome Extraction

Syndrome extraction is performed via stabilizer measurements on ancilla qubits that detect errors without collapsing the logical state. Our implementation uses X-type and Z-type stabilizer measurements in alternating rounds, with each round requiring 4 CNOT gates per stabilizer.

3.1 Measurement Protocol

Each syndrome extraction round follows a deterministic gate schedule optimized for the heavy-hex connectivity. The schedule minimizes crosstalk by ensuring no two CNOT gates sharing a qubit execute simultaneously. We use mid-circuit measurement with conditional reset to extract syndrome bits without circuit termination.

Syndrome Extraction Cycle
# Single syndrome extraction round (distance-5)
# Step 1: Prepare ancilla qubits in |+> or |0>
for a in x_ancillas:
    circuit.h(a)         # X-stabilizers start in |+>
# Z-ancillas remain in |0>

# Step 2: CNOT schedule (4 layers)
for layer in ["NW", "NE", "SE", "SW"]:
    for (ctrl, tgt) in schedule(layer):
        circuit.cx(ctrl, tgt)

# Step 3: Measure ancillas
for a in x_ancillas:
    circuit.h(a)         # Rotate back to Z-basis
for a in all_ancillas:
    circuit.measure(a, syndrome_bits[a])

# Step 4: Conditional reset for next round
for a in all_ancillas:
    with circuit.if_test((syndrome_bits[a], 1)):
        circuit.x(a)      # Reset to |0>

3.2 Error Detection Threshold

We perform d rounds of syndrome extraction for a distance-d code, providing temporal redundancy against measurement errors. The decoder receives a 3D syndrome volume (spatial + temporal) and uses minimum-weight perfect matching to identify the most likely error chain.

Syndrome Volume: S ∈ {0,1}nanc x d
3D syndrome array with nanc ancilla qubits across d extraction rounds
Section 4

Decoder Selection

The EpochCore platform offers three decoder strategies, each optimized for different workload characteristics. The fidelity-first routing engine selects the optimal decoder based on circuit depth, required fidelity, and latency constraints.

Decoder Latency Max Distance Accuracy Hardware Best For
Steane Code 0.2 μs d=5 99.2% CPU Low-latency, shallow circuits
NVIDIA QLDPC 1.8 μs d=13 99.8% GPU (RTX 5090) High-distance, deep circuits
Tensor Network 12.4 μs d=9 99.95% GPU (RTX 5090) Maximum fidelity, research

4.1 Steane Decoder

The Steane decoder uses a lookup-table approach for small code distances, mapping each syndrome pattern directly to a correction operator. For distance-3, the lookup table contains 28 = 256 entries. For distance-5, we use a compressed table with Hamming-weight partitioning. This decoder runs entirely on CPU and achieves sub-microsecond decoding latency, making it ideal for real-time error correction in latency-sensitive applications.

4.2 NVIDIA QLDPC Decoder

The QLDPC (Quantum Low-Density Parity-Check) decoder leverages GPU parallelism on NVIDIA RTX 5090 hardware to perform belief-propagation decoding. This approach scales efficiently to higher code distances and handles correlated errors that the Steane decoder cannot. The decoder processes syndrome volumes as sparse matrices, achieving O(n log n) scaling with qubit count.

NVIDIA QLDPC Decoder Configuration
from epochcore.qec.decoders import NvidiaQLDPC

decoder = NvidiaQLDPC(
    code_distance=7,
    bp_iterations=50,
    osd_order=10,
    device="cuda:0",       # RTX 5090
    batch_size=1024,      # Decode 1024 syndromes in parallel
    channel_model="depolarizing"
)

# Decode syndrome volume
corrections = decoder.decode(syndrome_volume)
# corrections: Pauli frame tracking for each logical qubit

4.3 Tensor Network Decoder

The Tensor Network decoder contracts a tensor network representation of the error probability distribution, computing exact maximum-likelihood corrections. While computationally more expensive, this decoder achieves the highest accuracy of any available strategy and is recommended for research workloads where fidelity is the primary concern. GPU acceleration via cuTensorNet reduces contraction time by up to 100x compared to CPU implementations.

Section 5

Performance Results

We benchmark the QEC system across all six Heron R3 backends using randomized benchmarking circuits at varying depths. The following results represent median values across 10,000 shots per configuration, measured over a 30-day production window.

Configuration Physical Error Rate Logical Error Rate Improvement Qubits Used
No QEC (baseline) 2.1 x 10-3 2.1 x 10-3 1x 1
Distance-3 / Steane 2.1 x 10-3 4.2 x 10-4 5x 17
Distance-5 / Steane 2.1 x 10-3 8.7 x 10-6 241x 49
Distance-5 / QLDPC 2.1 x 10-3 2.1 x 10-6 1,000x 49
Distance-5 / Tensor Net 2.1 x 10-3 1.1 x 10-6 1,909x 49

The distance-5 surface code with NVIDIA QLDPC decoding achieves our target logical error rate of 10-6, representing a 1,000x improvement over raw physical error rates. The Tensor Network decoder further reduces logical errors by approximately 2x at the cost of higher decoding latency, which is acceptable for non-real-time workloads.

Across all six backends, we observe consistent performance within 15% of the median values reported above, confirming that the fidelity-first routing engine effectively compensates for per-backend calibration variations.

Section 6

Conclusion

We have demonstrated a production-grade surface code QEC implementation on IBM Heron R3 hardware that achieves logical error rates of 10-6, enabling practical quantum advantage for enterprise workloads. The three-decoder strategy provides flexibility across the latency-fidelity tradeoff, while the automated fidelity-first routing engine ensures optimal backend and decoder selection without manual intervention.

Future work will focus on extending code distance to d=7 and d=9 as backend qubit counts increase, implementing real-time decoder switching based on mid-circuit syndrome statistics, and exploring concatenated codes that combine surface codes with outer codes for even lower logical error rates.

The EpochCore QEC layer is available to all platform users via the Python SDK and REST API, with decoder selection configurable per-circuit or delegated to the automatic routing engine.