Although convolutional neural networks (CNNs) have plenty of parallelism, traditional layer-by-layer task division designs for multi-FPGA systems have the following problems: (1) The computational load of each layer is different from each other, so the execution time is dominated with the heaviest one. (2) Each FPGA must be designed independently, it means that we must design, generate and manage various configuration files. To address this problem, we propose a horizontal division method that enables us to use of a single design for each FPGA. All layers are divided horizontal direction of the target CNN, and a set of layers is implemented on an FPGA. It reduces the time of design as well as management costs for the execution. Also, since the weight data can be separated, the usage of local memory can be reduced. The apparent disadvantage of this method is that it requires all-to-all data communication between FPGA boards, and so it is not suitable to traditional multi-FPGA systems with a simple linear network. Here, we tried to apply the method to FiC (Flow-in-Cloud) which has a powerful network to enable efficient broadcasting. A simple CNN LeNet and a matrix multiplication for more practical fully connected layer is implemented on the FiC prototype. As a result of the evaluation, LeNet using 8 FP-GAs achieved 7.5 times faster than that with a single FPGA, and achieved 12.6 times faster than the optimized software of a high-end CPU.