4K Real Time Image to Image Translation Network with Transformers

Kei Shibasaki, Shota Fukuzaki, Masaaki Ikehara

Research output: Contribution to journalArticlepeer-review

Abstract

CNNs have traditionally been applied in computer vision. Recently, applying Transformer networks, originally a technique in natural language processing, to computer vision has received much attention and produced superior results. However, Transformers and their derivation have drawbacks that the computational cost and memory usage increase rapidly with the image resolution. In this paper, we propose the Laplacian Pyramid Translation Transformer (LPTT) for image to image translation. The Laplacian Pyramid Translation Network, a previous study of this work, creates Laplacian pyramid of the input images and processes each component with CNNs. However, LPTT transforms the high-frequency components with CNNs and the low-frequency components with Axial Transformer blocks. LPTT can have Transformer’s expressive power while reducing the computational cost and memory usage. LPTT significantly improves the quality of generated images and inference speed for high-resolution images over conventional methods. LPTT is the first network with a Transformer that can perform practical inference in real time on 4K resolution images. LPTT can also process 8K images in real time depending on the model conditions and the performance of the GPU. The ablation study in this paper suggests that even when processing high-resolution images, the performance is improved while maintaining the inference speed by computing the low-resolution component with a Transformer. LPTT improves PSNR value by 0.41 dB in MIT-Adobe FiveK dataset. The greater the number of layers in the Laplacian pyramid, the greater the improvement of LPTT over the Laplacian Pyramid Translation Network.

Original languageEnglish
Pages (from-to)1
Number of pages1
JournalIEEE Access
DOIs
Publication statusAccepted/In press - 2022

Keywords

  • Computational efficiency
  • Deep learning
  • Image resolution
  • Image to image translation
  • Laplace equations
  • Laplacian pyramid
  • Photo retouching
  • Task analysis
  • Tensors
  • Transformer
  • Transformers
  • Transforms

ASJC Scopus subject areas

  • Computer Science(all)
  • Materials Science(all)
  • Engineering(all)
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of '4K Real Time Image to Image Translation Network with Transformers'. Together they form a unique fingerprint.

Cite this