Font Size: a A A

Research And Implementation Of Multimodal Micro-video Classification Method

Posted on:2023-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:F LiFull Text:PDF
GTID:2568306914460314Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Micro-videos have become an important carrier of Internet public opinion dissemination.Accurate and fast classification of multi-source and heterogeneous micro-video data is helpful to support Internet public opinion management decisions.However,due to the characteristics of high noise,high content complexity,multi-modal heterogeneity,and cross-platform distribution of content in micro-video public opinion data in practical application scenarios,resulting in multi-modal semantic conflicts and large data classification calculations,Internet public opinion managers It is difficult to grasp the development trend of public opinion in a specific field from the perspective of classification,which affects the efficiency of public opinion management.Aiming at the problem of multi-modal semantic conflict of micro-videos,this paper designs a multi-modal micro-video semantic classification model with multi-network structure.Firstly,a specially designed convolution module is used to process the image containing noise,and the low-level features of the noisy image are obtained by down-sampling,and the image after filtering the noise is generated by the pixel reorganization technology.Based on 3D convolution,Transformer encoder and multi-layer perceptron technology,the visual,audio and text modal semantics are deeply mined,and the semantic conflict is effectively alleviated by introducing multi-modal semantic correlation information of micro-videos.The experimental results show that this model can achieve a Top-1 accuracy rate of 59.4%in conventional classification scenarios,which is 4.2%higher than that of mainstream models,and achieve a Top-1 accuracy rate of 55.3%in the case of semantic conflicts.Achieving a Top-1 accuracy of 57.0%,outperforming other models.It is proved that this model can accurately extract the depth semantics of microvideos under noise conditions,alleviate semantic conflicts and improve the accuracy of micro-video classification.Aiming at the problem of large amount of computation for micro-video content classification,this paper designs a micro-video content classification model with low redundant information.Firstly,the micro-video low redundancy information screening module is used to screen the micro-video low redundancy information based on discrete cosine transform,so as to reduce the calculation amount of subsequent feature extraction.Secondly,through the lightweight feature extraction module,the two-dimensional separable convolution is used to reduce the model parameters,and finally the calculation amount of micro-video classification is reduced and the classification speed is accelerated.The experimental results show that this model achieves a Top-1 accuracy rate of 55.1%in conventional classification scenarios,and the classification speed can reach up to 2 times that of the mainstream model.It shows that this model can effectively reduce the complexity of micro-video content,thereby reducing the amount of calculation.In view of the problem that Internet content managers cannot easily,comprehensively and intuitively obtain micro-video data and its category information in specific fields.Based on the above two models,this paper uses Scrapy,Laravel and other frameworks to design and implement a multi-modal micro-video classification system,providing complete functions including multi-source micro-video data collection,distributed storage,rapid classification and visual presentation of analysis results.The system is functionally tested,and the test results prove that the system can meet the practical application requirements.
Keywords/Search Tags:micro-video, multimodal, feature fusion, classification, deep learning
PDF Full Text Request
Related items