| In current research, video summarization technology has become a hot spot. Application of sports video summary can help sports enthusiasts and sports senior analysts to obtain important information in sports video, effectively and quickly. This paper takes tennis video as the research object. Integration of the video information in the low-level features and middle semantic features to extract key frames, the important coefficient of key frames was given by rules and important function. Finally, fragment key frames in different levels and particle sizes of packet that based on the degree of importance. The video abstract was constructed in the form of collection. The method based on K-means algorithm combined audio features to generate video summary technology is more innovative and practical.First of all, the preparatory work of key frames extraction is shot segmentation of tennis video. This paper introduces the common shot segmentation algorithm. Considering the characteristics of the tennis video and the accuracy of shot segmentation, this paper uses the shot segmentation algorithm based on the histogram to segment the tennis video into single scene. The experiment shows that this method can satisfy the requirement of the experiment in this paper.Secondly, based on the statistics rules of tennis video, fragment the clustering key frames by clustering algorithm. After the statistical analysis of shot type, shot length and frame number of tennis video, it is concluded that the distribution rules of key frames in different scenes. The clustering initial value of different shot is determined by the statistical rules. This method is good enough to avoid the defects of K-means clustering algorithm for uncertain initial value. The advantages and disadvantages of feature selection of global features and local features have been learned in this paper. The final selection is HOG features (Histogram of Oriented Gradient direction gradient histogram). Fragment clustering key frames by feature clustering.After that, extract audio feature which is the middle level feature of tennis video. By sampling the audio energy value, fragment audio key frame. Through the analysis the audio data of tennis video, the analysis method of audio data that based on sliding window and adaptive threshold is put forward in this paper. Through the correspondence between audio sampling point and video frame number, determine the key frame number in sequence, so as to get the key frames of audio. For the two kinds of key frames, this paper constructs an important function, and gives importance weights to the key frames, grouped key frame based on weight, and then get the video abstract of different size.Finally, through model of the Gaussian Mixture-based Background, fragment the key frames which are in the same lens and the same background. Set the video abstract in different size. The experimental results show that, the affection of this innovative video abstraction is satisfaction. |