Robust camera pose estimation by viewpoint classification using deep learning

Yoshikatsu Nakajima, Hideo Saito

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Camera pose estimation with respect to target scenes is an important technology for superimposing virtual information in augmented reality (AR). However, it is difficult to estimate the camera pose for all possible view angles because feature descriptors such as SIFT are not completely invariant from every perspective. We propose a novel method of robust camera pose estimation using multiple feature descriptor databases generated for each partitioned viewpoint, in which the feature descriptor of each keypoint is almost invariant. Our method estimates the viewpoint class for each input image using deep learning based on a set of training images prepared for each viewpoint class. We give two ways to prepare these images for deep learning and generating databases. In the first method, images are generated using a projection matrix to ensure robust learning in a range of environments with changing backgrounds. The second method uses real images to learn a given environment around a planar pattern. Our evaluation results confirm that our approach increases the number of correct matches and the accuracy of camera pose estimation compared to the conventional method.

Original languageEnglish
Pages (from-to)189-198
Number of pages10
JournalComputational Visual Media
Volume3
Issue number2
DOIs
Publication statusPublished - 2017 Jun 1

Fingerprint

Cameras
Augmented reality
Deep learning

Keywords

  • augmented reality (AR)
  • convolutional neural network
  • deep learning
  • pose estimation

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Cite this

Robust camera pose estimation by viewpoint classification using deep learning. / Nakajima, Yoshikatsu; Saito, Hideo.

In: Computational Visual Media, Vol. 3, No. 2, 01.06.2017, p. 189-198.

Research output: Contribution to journalArticle

@article{f70b0ae44d874af5bf4a2b390ab03b2c,
title = "Robust camera pose estimation by viewpoint classification using deep learning",
abstract = "Camera pose estimation with respect to target scenes is an important technology for superimposing virtual information in augmented reality (AR). However, it is difficult to estimate the camera pose for all possible view angles because feature descriptors such as SIFT are not completely invariant from every perspective. We propose a novel method of robust camera pose estimation using multiple feature descriptor databases generated for each partitioned viewpoint, in which the feature descriptor of each keypoint is almost invariant. Our method estimates the viewpoint class for each input image using deep learning based on a set of training images prepared for each viewpoint class. We give two ways to prepare these images for deep learning and generating databases. In the first method, images are generated using a projection matrix to ensure robust learning in a range of environments with changing backgrounds. The second method uses real images to learn a given environment around a planar pattern. Our evaluation results confirm that our approach increases the number of correct matches and the accuracy of camera pose estimation compared to the conventional method.",
keywords = "augmented reality (AR), convolutional neural network, deep learning, pose estimation",
author = "Yoshikatsu Nakajima and Hideo Saito",
year = "2017",
month = "6",
day = "1",
doi = "10.1007/s41095-016-0067-z",
language = "English",
volume = "3",
pages = "189--198",
journal = "Computational Visual Media",
issn = "2096-0433",
publisher = "Tsinghua University Press",
number = "2",

}

TY - JOUR

T1 - Robust camera pose estimation by viewpoint classification using deep learning

AU - Nakajima, Yoshikatsu

AU - Saito, Hideo

PY - 2017/6/1

Y1 - 2017/6/1

N2 - Camera pose estimation with respect to target scenes is an important technology for superimposing virtual information in augmented reality (AR). However, it is difficult to estimate the camera pose for all possible view angles because feature descriptors such as SIFT are not completely invariant from every perspective. We propose a novel method of robust camera pose estimation using multiple feature descriptor databases generated for each partitioned viewpoint, in which the feature descriptor of each keypoint is almost invariant. Our method estimates the viewpoint class for each input image using deep learning based on a set of training images prepared for each viewpoint class. We give two ways to prepare these images for deep learning and generating databases. In the first method, images are generated using a projection matrix to ensure robust learning in a range of environments with changing backgrounds. The second method uses real images to learn a given environment around a planar pattern. Our evaluation results confirm that our approach increases the number of correct matches and the accuracy of camera pose estimation compared to the conventional method.

AB - Camera pose estimation with respect to target scenes is an important technology for superimposing virtual information in augmented reality (AR). However, it is difficult to estimate the camera pose for all possible view angles because feature descriptors such as SIFT are not completely invariant from every perspective. We propose a novel method of robust camera pose estimation using multiple feature descriptor databases generated for each partitioned viewpoint, in which the feature descriptor of each keypoint is almost invariant. Our method estimates the viewpoint class for each input image using deep learning based on a set of training images prepared for each viewpoint class. We give two ways to prepare these images for deep learning and generating databases. In the first method, images are generated using a projection matrix to ensure robust learning in a range of environments with changing backgrounds. The second method uses real images to learn a given environment around a planar pattern. Our evaluation results confirm that our approach increases the number of correct matches and the accuracy of camera pose estimation compared to the conventional method.

KW - augmented reality (AR)

KW - convolutional neural network

KW - deep learning

KW - pose estimation

UR - http://www.scopus.com/inward/record.url?scp=85038834569&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85038834569&partnerID=8YFLogxK

U2 - 10.1007/s41095-016-0067-z

DO - 10.1007/s41095-016-0067-z

M3 - Article

AN - SCOPUS:85038834569

VL - 3

SP - 189

EP - 198

JO - Computational Visual Media

JF - Computational Visual Media

SN - 2096-0433

IS - 2

ER -