The video summary generation technology is a high-level summary of video big data content.It has broad development prospects in transportation,medical health,animal husbandry and other industries,but the application of this technology in the field of education needs to be improved.Facing the classroom scene,combined with the characteristics of complex video data characteristics,small target motion range and low information density per unit time,and aiming at the problem that the existing video summary generation methods are not accurate,this paper constructs a convolution neural network model based on attention mechanism to obtain the candidate sequence,then removes the redundant frames through the improved hash clustering algorithm to obtain the final video summary.The main research contents of this paper are as followsFirstly,aiming at the problem that the existing video summarization generation methods have single input features and can’t fully highlight the key moving targets of class video,a video summary generation model based on DA-Res Net is proposed,which takes the image frames sequence and optical flow frames sequence as the dual stream input of the model,so as to obtain diversified video features,embeds the attention mechanism of channel and space in the deep residual network,further strengthen the weight of interested targets,finally obtains the importance score of video frames through personalized fusion to generate a more representative video summary.Secondly,aiming at the problem that there is redundant information in the video summary generated by DA-Res Net model and can’t accurately remove the similar frames of class video,a summary de-redundancy algorithm based on hash clustering is proposed,which integrates edge detection into the video frames preprocessing module,strengthens the detail difference between adjacent frames,makes up for the deficiency that the perceptual hash algorithm only pays attention to the overall structure of video frames,takes hamming distance as the index to adapt the number of clusters,takes the frame with large image entropy in each category as the final summary frame,removes other redundant information,then gets a more concise video summary.Finally,the ablation experiments and comparative analysis of the proposed algorithm are carried out with two public datasets of TVSum and Sum Me,which prove the superiority of the proposed algorithm.In addition,in order to verify its effectiveness in the classroom scene,a real dataset of different types,Class Video,is produced.The experiment shows that the model in this paper has been improved to varying degrees under different evaluation indicators,and the effective extraction of classroom video summary is realized. |