Time-sequential action recognition using pose-centric learning for action-transition videos

Tomoyuki Suzuki, Yoshimitsu Aoki

Research output: Contribution to journalArticle

Abstract

In this paper, we propose a method of human action recognition for videos in which actions are continuously transitioning. First, we make pose estimator which has learned joint coordinates using Convolutional Neural Networks (CNN) and extract feature from intermediate structure of it. Second, we train action recognizer structured by Long Short-Term Memory (LSTM), using pose feature and environmental feature as inputs. At that time, we propose Pose-Centric Learning. In addition, from pose feature we calculate Attention that represents importance of environmental feature for each element, and filtering latter feature by Attention to make this effective one. When modeling action recognizer, we structure Hierarchical model of LSTM. In experiments, we evaluated our method comparing to conventional method and achieve 15.7% improvement from it on challenging action recognition dataset.

Original languageEnglish
Pages (from-to)1156-1165
Number of pages10
JournalSeimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering
Volume83
Issue number12
DOIs
Publication statusPublished - 2017 Jan 1

Fingerprint

Neural networks
Experiments
Long short-term memory

Keywords

  • Action recognition
  • Neural network
  • Time-sequential analysis
  • Video analysis

ASJC Scopus subject areas

  • Mechanical Engineering

Cite this

@article{0eb9f49e7d754ff382a06e2786338ab1,
title = "Time-sequential action recognition using pose-centric learning for action-transition videos",
abstract = "In this paper, we propose a method of human action recognition for videos in which actions are continuously transitioning. First, we make pose estimator which has learned joint coordinates using Convolutional Neural Networks (CNN) and extract feature from intermediate structure of it. Second, we train action recognizer structured by Long Short-Term Memory (LSTM), using pose feature and environmental feature as inputs. At that time, we propose Pose-Centric Learning. In addition, from pose feature we calculate Attention that represents importance of environmental feature for each element, and filtering latter feature by Attention to make this effective one. When modeling action recognizer, we structure Hierarchical model of LSTM. In experiments, we evaluated our method comparing to conventional method and achieve 15.7{\%} improvement from it on challenging action recognition dataset.",
keywords = "Action recognition, Neural network, Time-sequential analysis, Video analysis",
author = "Tomoyuki Suzuki and Yoshimitsu Aoki",
year = "2017",
month = "1",
day = "1",
doi = "10.2493/jjspe.83.1156",
language = "English",
volume = "83",
pages = "1156--1165",
journal = "Seimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering",
issn = "0912-0289",
publisher = "Japan Society for Precision Engineering",
number = "12",

}

TY - JOUR

T1 - Time-sequential action recognition using pose-centric learning for action-transition videos

AU - Suzuki, Tomoyuki

AU - Aoki, Yoshimitsu

PY - 2017/1/1

Y1 - 2017/1/1

N2 - In this paper, we propose a method of human action recognition for videos in which actions are continuously transitioning. First, we make pose estimator which has learned joint coordinates using Convolutional Neural Networks (CNN) and extract feature from intermediate structure of it. Second, we train action recognizer structured by Long Short-Term Memory (LSTM), using pose feature and environmental feature as inputs. At that time, we propose Pose-Centric Learning. In addition, from pose feature we calculate Attention that represents importance of environmental feature for each element, and filtering latter feature by Attention to make this effective one. When modeling action recognizer, we structure Hierarchical model of LSTM. In experiments, we evaluated our method comparing to conventional method and achieve 15.7% improvement from it on challenging action recognition dataset.

AB - In this paper, we propose a method of human action recognition for videos in which actions are continuously transitioning. First, we make pose estimator which has learned joint coordinates using Convolutional Neural Networks (CNN) and extract feature from intermediate structure of it. Second, we train action recognizer structured by Long Short-Term Memory (LSTM), using pose feature and environmental feature as inputs. At that time, we propose Pose-Centric Learning. In addition, from pose feature we calculate Attention that represents importance of environmental feature for each element, and filtering latter feature by Attention to make this effective one. When modeling action recognizer, we structure Hierarchical model of LSTM. In experiments, we evaluated our method comparing to conventional method and achieve 15.7% improvement from it on challenging action recognition dataset.

KW - Action recognition

KW - Neural network

KW - Time-sequential analysis

KW - Video analysis

UR - http://www.scopus.com/inward/record.url?scp=85037088907&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85037088907&partnerID=8YFLogxK

U2 - 10.2493/jjspe.83.1156

DO - 10.2493/jjspe.83.1156

M3 - Article

AN - SCOPUS:85037088907

VL - 83

SP - 1156

EP - 1165

JO - Seimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering

JF - Seimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering

SN - 0912-0289

IS - 12

ER -