Spatiotemporal Video Highlight by Neural Network Considering Gaze and Hands of Surgeon in Egocentric Surgical Videos

Keitaro Yoshida, Ryo Hachiuma, Hisako Tomita, Jingjing Pan, Kris Kitani, Hiroki Kajita, Tetsu Hayashida, Maki Sugimoto

研究成果: Article査読

2 被引用数 (Scopus)

抄録

In the medical field, surgical videos can be used to introduce surgical skills. Medical students and residents watch the videos to study the surgical skills and increase learning speed by compensating for the lack of experience in surgical rooms due to limited opportunity to join in surgery. To record egocentric surgical videos by a wearable camera is a solution to record surgical skills of a surgeon in detail. However, most egocentric surgical videos are of quite long duration. For example, in the case of tumor removal in breast surgery, a video recording time often reaches 2h. With that length, it is time consuming to see important scenes in the video, particularly because many surgical videos include nonessential scenes such as sterilization and preparation of tools. For extracting specific scenes from a long video, we can apply scene estimation by machine learning. Furthermore, it is important to know where the surgeon is looking to observe the area of the incision in detail. In particular, it is vital to be able to zoom in on key elements, allowing viewers to see the incision area and the fine details of the necessary surgical skills. In this study, we aimed to highlight incision scenes from egocentric surgical videos in the spatiotemporal domain by utilizing two neural networks for the temporal and spatial highlights. For the temporal highlights, we designed a neural network that estimates the incision scenes by learning gaze speed, hand movements, number of hands, and background movements in egocentric surgical videos. For the spatial highlights, in order to estimate the important area to zoom in, we designed a neural network that learns the surgeon's gaze on natural features of surgical scenes to form a probability map as a representation of the estimated gaze area. The estimated gaze area was also used to calculate the appropriate zoom-in position and zoom-in ratio. To control the highlighted parameters in accord with user preferences, we also made a user interface that allows for the selection of playback speed gain and zoom ratio gain. For the evaluation, we verified the performance of the networks by a quantitative assessment and conducted a user study with medical doctors by showing an actual surgical video to obtain a qualitative assessment on the proposed system.

本文言語English
論文番号2141001
ジャーナルJournal of Medical Robotics Research
7
1
DOI
出版ステータスPublished - 2022 3月 1

ASJC Scopus subject areas

  • 生体医工学
  • 人間とコンピュータの相互作用
  • コンピュータ サイエンスの応用
  • 人工知能
  • 応用数学

フィンガープリント

「Spatiotemporal Video Highlight by Neural Network Considering Gaze and Hands of Surgeon in Egocentric Surgical Videos」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル