Font Size: a A A

Research On Unsupervised Video Semantic Learning Methods Based On Deep Neural Networks

Posted on:2023-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:X JiangFull Text:PDF
GTID:2568306818495184Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the number of online video users in China has reached 900 million.Behind the huge user,scale is the explosive growth of the number of videos on the Internet.In the face of such a huge amount of video data,how to effectively classify and manage this video data has become an urgent problem for all video platforms.In recent years,the rapid development of deep learning,especially the development of semantic analysis of images and videos,provides a more efficient way to solve this problem.Video classification technology to identify the content in the video can effectively save the cost of manual audit of the network platform.However,the classification task often requires abundant labeled samples to train the model,which is sometimes not easy to be satisfied in reality.Video semantic analysis is the research of analyzing various content information contained in the video,and this paper mainly analyzes the video scene.To solve the problem of video scene analysis,this paper proposes a new method of extracting features of video images through self-created models and obtaining key-frames by clustering technology.This method can process large-scale videos,reduce the scale of video processing tasks and effectively shorten the time of video content analysis.Therefore,the content of this paper includes two parts: 1)research on the deep self-supervised clustering algorithm;2)Using the deep self-supervised clustering algorithm to study video semantic analysis and to complete the task of key-frame extraction.For the research of clustering algorithm,this paper proposes a self-supervised deep clustering algorithm embedding adjacency graph features.Specifically,we first construct an adjacency graph matrix for the data,then train the encoder to obtain depth feature representation with global spatial information,and use the KNN algorithm to obtain self-supervised pseudolabels in the feature space.By combining clustering loss,auto-encoder reconstruction loss,and self-monitoring loss,the model can retain the global structure of data while preserving local features,to learn features suitable for clustering.Experiments on multiple image data sets verify the importance of preserving the global structure and the effectiveness of the algorithm.For the specific task of video key-frame extraction,based on the above studies,we further refine the proposed self-supervised deep clustering model,in which the convolutional neural network is used to extract the features of the video,to obtain the depth feature representation of the video.In the training process of the model,the depth features are divided into two parts:1)self-supervised pseudo-labels are obtained by a spectral clustering algorithm in the depth feature space.2)The Auto-encoder is embedded in the deep feature space,and the pseudolabels obtained by spectral clustering are used to supervise the clustering results of the Autoencoder to complete the clustering task.Experiments on multiple image data sets show the superiority of the clustering algorithm used by the model,and the key-frame extraction task is carried out on UCF-101 data sets,which verifies the effectiveness of the model.
Keywords/Search Tags:Deep clustering, Adjacent graph features, Self-supervised learning, Spectral clustering, Key-Frame extraction
PDF Full Text Request
Related items