In recent years, there has been a demand in the sports industry to reduce the burden of data collection and video editing for tactical analysis. To achieve these, a system that can recognize the game context is needed. In this study, we proposed a method to identify the player's shot timing at the frame level during a ball-striking sport. In this study, players' shots were detected in video of a tennis match. It was shown that shots could be detected with an F-score value of 87% or more within an error range of 1 frame (0.033 sec) by considering time-series information using a recurrent neural network. This technology is expected to be applied not only to tennis, but also to other sports that involve ball shots, such as table tennis, baseball, and volleyball. At the same time, it can be used to detect moments of a specific action (for example, touching or hitting an object).