Font Size: a A A

Video Key Content Extraction And Summary Generation

Posted on:2021-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:L G ZhouFull Text:PDF
GTID:2518306050468824Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the continuous popularity of intelligent mobile devices,in order to capture special moments or record daily life,a large number of videos are generated and uploaded to the Internet every day.However,many users shoot first,then clip,or upload to the network without clip.Ordinary video uploaders seldom consider to help other users retrieve their videos quickly.Therefore,they seldom consider to describe the main contents of videos in the process of uploading videos.For example,they describe the main contents of videos by means of video text title,video type and content annotation,so as to help other users retrieve the target videos quickly.In order to effectively retrieve the videos without the help of the uploader,video content recognition technology plays an important role in video retrieval.The existing visual recognition technology has been more mature,but video content recognition technology because of the high redundancy of video information,which brings a great burden to the computing resources.Therefore,before video content recognition,we need to use the key content extraction technology to reduce the redundancy of video and improve the efficiency of video content recognition technology.Therefore,video key content extraction plays a decisive role in the efficiency of video content recognition.Video key content refers to extracting a series of video frames or video key segments from the original video,which can briefly express the whole video content.The process of reducing the original video to some still pictures is called video key frame extraction.Video key frames are mainly used in the selection of video cover,the recognition of objects such as people and objects in the video.At the same time,the operation of reducing the original video to some video clips is called video summarization,which is mainly used in video browsing and dynamic summarization.Aiming at the problem of video key content extraction,this paper proposes a method of video key frame extraction and two methods of video summary generation.The main research work is as follows:First,in order to extract more clear and representative key frames from the original video,we propose a key frame extraction method based on video content edge tracking.Based on the traditional edge tracking and shot detection,this method proposes the maximum pooling of edge change rate and the difference analysis method of edge change rate,which solves the problem of edge change rate mutation between adjacent video frames To make scoring lens detection more stable.Moreover,in order to extract the most representative and clear key in each shot,we define a static coefficient to measure the changes between video frames,and then extract the most representative and clear key frames.Experimental results show that the method can extract the key frames containing the target information,such as human image,object,etc.quickly under the requirements of low time and computational complexity.Second,we propose a video summarization generation method based on self attention mechanism.This method uses the natural language processing field to model the time-domain information,and proposes to build a long dependence relationship between video frame sequences through the self attention mechanism to the time-domain information of video,that is,to build the feature relationship between video frames.This method learns the global features of video by self attention mechanism network,and explores the differences and connections of video features in time sequence.Finally,it makes the feature distribution of video close to the real value of video summary,and then extracts the key segments of video.Third,we propose a video summarization method based on video text correlation.This method proposes a video text similarity learning network,which uses the video text title to learn the similarity between the visual features of the video and the text features of the video title,guide the visual features of the video and the description of the scene,goal and action in the title text to be consistent,build a video text space,and achieve the purpose of aggregating similar semantic visual information.The visual features can be trusted to be similar to the text features,and then the clustering method is used to effectively merge the video content with similar semantics,reduce the redundant information of the video,and finally generate the key content of the video.
Keywords/Search Tags:Video Key Frame, Video Summary, Video Shot Detection, Self Attention Mechanism, Video Text Space
PDF Full Text Request
Related items