Font Size: a A A

Research On Video Summarization Based On Frame Score

Posted on:2020-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:X R WangFull Text:PDF
GTID:2428330572483893Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the popularity of video capture devices and the development of Internet technologies,people can easily share a large number of videos on the Internet,which makes the video data on the Internet explode.The increase of video data can enrich people's social life,but it also inevitably causes some problems.For example,browsing video takes up more user's time,video retrieval speed is slower,and video websites need a lot of video storage space.In order to address these issues,video summarization technology has gradually emerged and attracted the attention of researchers.The video summarization selects static video frames or dynamic video segments that extract the original video content to form a summarization.The time length of summarization is much shorter than the original video's,so users can understand the video content in a smaller amount of time.In addition,video summarization technology can also speed up the retrieval process,while also saving the storage space of video websites.Most of the existing video summarization methods firstly establish the corresponding mathematical models based on empirical judgement on summarization attributes(such as representativeness,importance,etc.),and then use the constructed mathematical models to score the candidate summarization.Finally,the attribute scores of summarization are obtained by a linear or nonlinear fusion strategy,and the attribute scores are used as the standard for selecting the video summarization.However,mathematical models sometimes do not accurately represent the attributes of the video summarization,and attributes defined according to personal experience are difficult to meet all user needs.In addition,partly video summarization method is to score the subset of video frames,select the subset with the highest score as the summarization,and the subset of video frames is often a large number,and scoring the subset of video frames needs to spend a large amount of time.To solve these problems,this thesis proposes two video summarization methods,Video summarization based on learning to rank and video summarization based on cross-modal mutual similarity.(1)In the proposed method video summarization based on learning to rank,the score of the video frame represents the relationship between the video frame and video content,and the high score indicates that the video frame can better reflect the video content.This video summarization method extracts video frames with high scores as summarization.In this method,in order to make the learning ranking function better conform to the cognitive rule of the human video summarization,firstly,the probability distribution of algorithm is obtained by the probability distribution function.At the same time,the probability distribution of the artificial labelling frame scores is obtained by the same probability distribution function.Finally,the difference between the two probability distributions is measured by the cross entropy loss.The corresponding parameter is the optimal parameter when the loss is minimum.The ranking function at this time simulates the situation of artificial scoring well.This method scores video frames rather than video subsets,and the computational complexity is markedly declined.In addition,it does not constrain summarization,and the good performance on the database indicates the effectiveness of the proposed method.(2)The method called video summarization based on cross-modal mutual similarity mainly uses video text information to generate video summarization.Relevant researches show that the video text information reflects the main content of the video to some extent.Therefore,the summarization can be obtained by exploring the text information.This method first uses the deep learning model to calculate the similarity of text to video frame space and the similarity of video frame to text space,and then fuses two similarities as the final similarity between video frame and text.Frames with higher similarity scores are extracted as the video summarization.The method can measure common information among different modalites,and the modal-specific information is also considered.The proposed method can decline the computational complexity.Moreover,it does not constrain the summarization attribute.The experimental results indicate the effectiveness of the method.
Keywords/Search Tags:video summarization, learning to rank, convolutional neural networks, cross-modal similarity
PDF Full Text Request
Related items