Recording surgery is an important technique for education and the evaluation of medical treatments. However, capturing targets such as the surgical field, surgical tools, and the surgeon's hands, is almost impossible since these targets are heavily occluded by the surgeon's head and body during a surgery. We used a recording system in which multiple cameras are installed in the surgical lump, supposing at least one camera would capture the target without occlusion. As this system records multiple video sequences, we address the task to select a best view camera automatically. Recently, learning-based approaches in a fully supervised manner have been proposed for this task, but these previous approaches completely rely on manual annotation of the training data. In this paper, we focus on the eye tracker mounted on the surgeon's head, which can capture the recording targets without occlusion. Employing this first-person-view video synchronized with multiple videos of the surgical lump, we propose a novel camera selection approach using a self-supervised learning framework. In experiments, we created a dataset composed of four different breast surgery. Our extended experiments showed that our approach successfully switched to the best camera view without manual annotation and achieved competitive accuracy compared with conventional supervised methods. Also, our approach yielded effective visual representations comparable to state-of-the-art self-supervised learning frameworks.
ASJC Scopus subject areas
- コンピュータ サイエンス（全般）