TY - GEN
T1 - Transformer Networks for Future Person Localization in First-Person Videos
AU - Alikadic, Amar
AU - Saito, Hideo
AU - Hachiuma, Ryo
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Reliably and accurately forecasting future trajectories of pedestrians is necessary for systems like autonomous vehicles or visual assistive devices to function correctly. While previous state-of-the-art methods relied on modeling social interactions with LSTMs, with videos captured with a static camera from a bird’s-eye view, our paper presents a new method that leverages the Transformers architecture and offers a reliable way to model future trajectories in first-person videos captured by a body-mounted camera, without having to model any social interactions. Accurately forecasting future trajectories is a challenging task, mainly due to how unpredictably humans move. We tackle this issue by using information about target persons’ previous locations, scales, and dynamic poses, as well as information about the camera wearer’s ego-motion. The model we propose predicts future trajectories in a simple way, modeling each target’s trajectory separately, without the use of complex social interactions between humans or interactions between targets and the scene. Experimental results show that our method overall outperforms previous state-of-the-art methods, and yields better results in challenging situations where previous state-of-the-art methods fail.
AB - Reliably and accurately forecasting future trajectories of pedestrians is necessary for systems like autonomous vehicles or visual assistive devices to function correctly. While previous state-of-the-art methods relied on modeling social interactions with LSTMs, with videos captured with a static camera from a bird’s-eye view, our paper presents a new method that leverages the Transformers architecture and offers a reliable way to model future trajectories in first-person videos captured by a body-mounted camera, without having to model any social interactions. Accurately forecasting future trajectories is a challenging task, mainly due to how unpredictably humans move. We tackle this issue by using information about target persons’ previous locations, scales, and dynamic poses, as well as information about the camera wearer’s ego-motion. The model we propose predicts future trajectories in a simple way, modeling each target’s trajectory separately, without the use of complex social interactions between humans or interactions between targets and the scene. Experimental results show that our method overall outperforms previous state-of-the-art methods, and yields better results in challenging situations where previous state-of-the-art methods fail.
KW - Future person localization
KW - Trajectory forecasting
KW - Transformer networks
UR - http://www.scopus.com/inward/record.url?scp=85145261049&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85145261049&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-20716-7_14
DO - 10.1007/978-3-031-20716-7_14
M3 - Conference contribution
AN - SCOPUS:85145261049
SN - 9783031207150
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 179
EP - 190
BT - Advances in Visual Computing - 17th International Symposium, ISVC 2022, Proceedings
A2 - Bebis, George
A2 - Li, Bo
A2 - Yao, Angela
A2 - Liu, Yang
A2 - Duan, Ye
A2 - Lau, Manfred
A2 - Khadka, Rajiv
A2 - Crisan, Ana
A2 - Chang, Remco
PB - Springer Science and Business Media Deutschland GmbH
T2 - 17th International Symposium on Visual Computing, ISVC 2022
Y2 - 3 October 2022 through 5 October 2022
ER -