Constant velocity 3d convolution

Yusuke Sekikawa, Kohta Ishikawa, Hideo Saito

研究成果: Article

抄録

We propose a novel 3-D convolution method, cv3dconv, for extracting spatiotemporal features from videos. It reduces the number of sum-of-products operations in 3-D convolution by thousands of times by assuming the constant moving velocity of the features. We observed that a specific class of video sequences, such as video captured by an in-vehicle camera, can be well approximated with piece-wise linear movements of 2-D features in a temporal dimension. Our principal finding is that a 3-D kernel, represented by constant velocity, can be decomposed into a convolution of a 2-D-shaped kernel and a 3-D-velocity kernel, which is parameterized using only two parameters. We derived an efficient recursive algorithm for this class of 3-D convolution, which is exceptionally suited for sparse spatiotemporal data, and this parameterized decomposed representation imposes a structured regularization along a temporal direction. We experimentally verified the validity of our approximation using a controlled dataset, and we also showed the effectiveness of the cv3dconv by adopting it for deep neural networks (DNNs) in visual odometry estimation task using publicly available event-based camera dataset captured in urban road scene. Our DNN architecture improves the estimation accuracy for about 30% compared with the existing states-of-the-arts architecture designed for event data.

元の言語English
記事番号8543783
ページ(範囲)76490-76501
ページ数12
ジャーナルIEEE Access
6
DOI
出版物ステータスPublished - 2018 1 1

Fingerprint

Convolution
Cameras
Network architecture
Deep neural networks

ASJC Scopus subject areas

  • Computer Science(all)
  • Materials Science(all)
  • Engineering(all)

これを引用

Sekikawa, Y., Ishikawa, K., & Saito, H. (2018). Constant velocity 3d convolution. IEEE Access, 6, 76490-76501. [8543783]. https://doi.org/10.1109/ACCESS.2018.2883340

Constant velocity 3d convolution. / Sekikawa, Yusuke; Ishikawa, Kohta; Saito, Hideo.

:: IEEE Access, 巻 6, 8543783, 01.01.2018, p. 76490-76501.

研究成果: Article

Sekikawa, Y, Ishikawa, K & Saito, H 2018, 'Constant velocity 3d convolution', IEEE Access, 巻. 6, 8543783, pp. 76490-76501. https://doi.org/10.1109/ACCESS.2018.2883340
Sekikawa, Yusuke ; Ishikawa, Kohta ; Saito, Hideo. / Constant velocity 3d convolution. :: IEEE Access. 2018 ; 巻 6. pp. 76490-76501.
@article{c2fea70353b54db1aeeb8a05236b7681,
title = "Constant velocity 3d convolution",
abstract = "We propose a novel 3-D convolution method, cv3dconv, for extracting spatiotemporal features from videos. It reduces the number of sum-of-products operations in 3-D convolution by thousands of times by assuming the constant moving velocity of the features. We observed that a specific class of video sequences, such as video captured by an in-vehicle camera, can be well approximated with piece-wise linear movements of 2-D features in a temporal dimension. Our principal finding is that a 3-D kernel, represented by constant velocity, can be decomposed into a convolution of a 2-D-shaped kernel and a 3-D-velocity kernel, which is parameterized using only two parameters. We derived an efficient recursive algorithm for this class of 3-D convolution, which is exceptionally suited for sparse spatiotemporal data, and this parameterized decomposed representation imposes a structured regularization along a temporal direction. We experimentally verified the validity of our approximation using a controlled dataset, and we also showed the effectiveness of the cv3dconv by adopting it for deep neural networks (DNNs) in visual odometry estimation task using publicly available event-based camera dataset captured in urban road scene. Our DNN architecture improves the estimation accuracy for about 30{\%} compared with the existing states-of-the-arts architecture designed for event data.",
keywords = "3D convolution, Convolutional neural network, event-based camera, spatiotemporal convolution",
author = "Yusuke Sekikawa and Kohta Ishikawa and Hideo Saito",
year = "2018",
month = "1",
day = "1",
doi = "10.1109/ACCESS.2018.2883340",
language = "English",
volume = "6",
pages = "76490--76501",
journal = "IEEE Access",
issn = "2169-3536",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Constant velocity 3d convolution

AU - Sekikawa, Yusuke

AU - Ishikawa, Kohta

AU - Saito, Hideo

PY - 2018/1/1

Y1 - 2018/1/1

N2 - We propose a novel 3-D convolution method, cv3dconv, for extracting spatiotemporal features from videos. It reduces the number of sum-of-products operations in 3-D convolution by thousands of times by assuming the constant moving velocity of the features. We observed that a specific class of video sequences, such as video captured by an in-vehicle camera, can be well approximated with piece-wise linear movements of 2-D features in a temporal dimension. Our principal finding is that a 3-D kernel, represented by constant velocity, can be decomposed into a convolution of a 2-D-shaped kernel and a 3-D-velocity kernel, which is parameterized using only two parameters. We derived an efficient recursive algorithm for this class of 3-D convolution, which is exceptionally suited for sparse spatiotemporal data, and this parameterized decomposed representation imposes a structured regularization along a temporal direction. We experimentally verified the validity of our approximation using a controlled dataset, and we also showed the effectiveness of the cv3dconv by adopting it for deep neural networks (DNNs) in visual odometry estimation task using publicly available event-based camera dataset captured in urban road scene. Our DNN architecture improves the estimation accuracy for about 30% compared with the existing states-of-the-arts architecture designed for event data.

AB - We propose a novel 3-D convolution method, cv3dconv, for extracting spatiotemporal features from videos. It reduces the number of sum-of-products operations in 3-D convolution by thousands of times by assuming the constant moving velocity of the features. We observed that a specific class of video sequences, such as video captured by an in-vehicle camera, can be well approximated with piece-wise linear movements of 2-D features in a temporal dimension. Our principal finding is that a 3-D kernel, represented by constant velocity, can be decomposed into a convolution of a 2-D-shaped kernel and a 3-D-velocity kernel, which is parameterized using only two parameters. We derived an efficient recursive algorithm for this class of 3-D convolution, which is exceptionally suited for sparse spatiotemporal data, and this parameterized decomposed representation imposes a structured regularization along a temporal direction. We experimentally verified the validity of our approximation using a controlled dataset, and we also showed the effectiveness of the cv3dconv by adopting it for deep neural networks (DNNs) in visual odometry estimation task using publicly available event-based camera dataset captured in urban road scene. Our DNN architecture improves the estimation accuracy for about 30% compared with the existing states-of-the-arts architecture designed for event data.

KW - 3D convolution

KW - Convolutional neural network

KW - event-based camera

KW - spatiotemporal convolution

UR - http://www.scopus.com/inward/record.url?scp=85057394864&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85057394864&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2018.2883340

DO - 10.1109/ACCESS.2018.2883340

M3 - Article

AN - SCOPUS:85057394864

VL - 6

SP - 76490

EP - 76501

JO - IEEE Access

JF - IEEE Access

SN - 2169-3536

M1 - 8543783

ER -