Due to the high error rate of a qubit, detecting and correcting errors on it is essential for fault-tolerant quantum computing (FTQC). Surface code (SC) associated with its decoding algorithm is one of the most promising quantum error correction (QEC) methods because it has high fidelity and requires only nearest neighbor qubits connectivity. To realize FTQC, we need a decoder circuit capable of not only QEC in a 3-D lattice to deal with errors in measurement on ancillary qubits but also quantum operations on logically constructed qubits. Whereas several methods to perform logical operations on SC, such as lattice surgery (LS), are known, no practical decoders supporting them have been proposed yet.One of the most promising QC implementations today is made up of superconducting qubits that are located in a cryogenic environment. To reduce the hardware complexity of QC and latency of QEC, we are supposed to perform QEC in a cryogenic environment. Hence a power-efficient decoder is required due to the limited power budget inside a dilution refrigerator.In this paper, we propose an online-QEC algorithm that supports LS with a practical decoder circuit, as well as a new FTQC architecture. We design a key building block of the proposed architecture with a hybrid of SFQ- and Cryo-CMOS-based digital circuits and evaluate it with a SPICE-level simulation. Each logic element includes about 2400 Josephson junctions, and power consumption is estimated to be 2.07 μW when operating with a 2 GHz clock frequency. We evaluate the decoder performance by a quantum error simulator for an essential operation of LS with code distances up to 11, and it achieves a 0.6% accuracy threshold. In an LS-based architecture further supporting a magic-state distillation protocol, which is expected to run for near-term universal quantum computing, we evaluate the QEC performance and power consumption of the architecture and show that it is practical to be operated in 4-K temperature region of a dilution refrigerator.