The use of image processing technology for sports is increasing. By analyzing athletes and teams based on video analysis, scientific and objective analysis can be conducted separately from subjective analysis of experts, and individuals or teams can be evaluated with the same index. As can be seen from such a trend, recently, technological progress has made it possible to detect and analyze what can be visually confirmed in the sports video. It is certain that these technologies are helping coaches in the analysis of movement. However, these cannot play the role of coaches. The role is to judge what is important from what can be visually confirmed. There are also empirical and qualitative parts in coach's judgment, and no reproducibility of understanding the important parts unless it is a specialist. Based on this background, we thought that it would be useful to extract more important scenes during the game using quantitative information. Such technology can be applied to various sports, but this paper, we focused on tennis. In this thesis, the aim is to estimate which play was largely contributed to the tennis game result (score, failure). In addition, we don't use supervised information that can be obtained from an empirical point of view. This is to eliminate dataset dependency caused by using data created from qualitative information. Specifically, based on quantitative information such as athletes' movement and score result, we attempted to estimate the attention from unsupervised method.