Font Size: a A A

Multi-modal Video Scene Segmentation Optimization Algorithm Based On Convolutional Neural Network

Posted on:2024-07-01Degree:MasterType:Thesis
Country:ChinaCandidate:Q HuangFull Text:PDF
GTID:2568307163963039Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Video scene segmentation is a hinge step to complete content-based video retrieval.It takes the video shots as the research object,aggregates similar continuous video shots into the same video scene,and divides the video into several different and semantically related logical story units.Taking video scene as the basis is the key to achieve advanced tasks such as video retrieval and summary.Therefore,video scene segmentation appears in a large number of mainstream video processing research tasks in academia.The mainstream scene segmentation algorithms include graph-based methods,multi-modal fusion-based methods,and deep network-based methods.The traditional graph-based video scene segmentation methods do not fully consider the rich features contained in video,and not sufficiently extract the features results in low accuracy.Simultaneously the methods based on deep network often bring huge computational loss,which is not conducive to video retrieval.This paper focuses on the video scene segmentation algorithm and the key frame extraction algorithm and other important issues to finished in-depth research,mainly do the following two aspects of work:Aiming at the low efficiency of keyframe extraction that resulting in insufficient representation of selected keyframes and performance of the video retrieval system,this paper proposed a keyframe extraction algorithm based on multi-feature fusion similarity.Firstly,we used a combination method of color histogram and full convolutional neural network to detect video shots,segmented the video into shots with higher content correlation.Secondly,we used the multi-feature fusion similarity method to extract keyframes from the segmented shots.Finally,this paper used the deep feature similarity method to remove redundant keyframes,and obtained more accurate results.Secondly,aiming at the problem that the efficiency of video scene segmentation algorithm still needs to be improved,this paper proposed a scene segmentation optimization algorithm based on convolutional neural network to extract video multi-modal features.Firstly,the proposed algorithm uses the VGG19 neural network which changes the final full connection layer to extract different low-level features such as visual and audio and advanced features from the video footage.Then,through the idea of multi-mode fusion,the similarity of the multi-feature fusion is calculated,so as to obtain the similarity of the shot.The problem of video scene segmentation is transformed into a binary problem of the video shot boundary.Finally,an optimization method based on scoring mechanism is proposed to optimize the scene boundary again to achieve the goal of video scene segmentation algorithm.The experimental results show that the key frame obtained by keyframe extraction algorithm proposed in this paper has a strong generalization of video,it can be applied to video retrieval and abstract.The overall recall rate and precision rate can reach 85.61% and83.21%,respectively.Compared with the proposed method based on shot detection by Bommisetty et al.,both indicators are improved by more than 20%.And the scene segmentation algorithm proposed in this paper can effectively segment video scene,and the overall recall rate,precision rate and F value can reach 85.77%,87.01%,and 86.73%,respectively.Compared with the algorithm based on shots transition graphs proposed by Kumar et al.,each rate has an improvement of about 10%.Compared with the algorithm based on deep network proposed by Ji et al.,the recall rate and F value are increased by16% and 8% respectively.
Keywords/Search Tags:scene segmentation, keyframe extraction, multi-modal, convolutional neural network, similarity measure
PDF Full Text Request
Related items