4K Real Time Image to Image Translation Network with Transformers

Kei Shibasaki, Shota Fukuzaki, Masaaki Ikehara

研究成果: Article査読

抄録

CNNs have traditionally been applied in computer vision. Recently, applying Transformer networks, originally a technique in natural language processing, to computer vision has received much attention and produced superior results. However, Transformers and their derivation have drawbacks that the computational cost and memory usage increase rapidly with the image resolution. In this paper, we propose the Laplacian Pyramid Translation Transformer (LPTT) for image to image translation. The Laplacian Pyramid Translation Network, a previous study of this work, creates Laplacian pyramid of the input images and processes each component with CNNs. However, LPTT transforms the high-frequency components with CNNs and the low-frequency components with Axial Transformer blocks. LPTT can have Transformer’s expressive power while reducing the computational cost and memory usage. LPTT significantly improves the quality of generated images and inference speed for high-resolution images over conventional methods. LPTT is the first network with a Transformer that can perform practical inference in real time on 4K resolution images. LPTT can also process 8K images in real time depending on the model conditions and the performance of the GPU. The ablation study in this paper suggests that even when processing high-resolution images, the performance is improved while maintaining the inference speed by computing the low-resolution component with a Transformer. LPTT improves PSNR value by 0.41 dB in MIT-Adobe FiveK dataset. The greater the number of layers in the Laplacian pyramid, the greater the improvement of LPTT over the Laplacian Pyramid Translation Network.

本文言語English
ページ(範囲)1
ページ数1
ジャーナルIEEE Access
DOI
出版ステータスAccepted/In press - 2022

ASJC Scopus subject areas

  • コンピュータ サイエンス(全般)
  • 材料科学(全般)
  • 工学(全般)
  • 電子工学および電気工学

フィンガープリント

「4K Real Time Image to Image Translation Network with Transformers」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル