Font Size: a A A

News Event Search System Based On Cross-modal Semantic Representation Consistency

Posted on:2022-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:H J GuoFull Text:PDF
GTID:2518306341454184Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Modality refers to the existence form of data,such as pictures,texts,videos,etc.Cross-modal retrieval aims to use the data of one modality as a query to retrieve related data of another modality.Nowadays,the rapid development of the Internet has promotes the explosive growth of multimedia data,which puts forward higher requirements for retrieval.Compared with traditional single-modal retrieval,cross-modal retrieval takes advantage of the characteristics of low-feature heterogeneity and high semantic relevance of different modal related data,which greatly enriches our understanding of the same thing or event,and has important research significance and practical value.However,the existing cross-modal image-text retrieval model still has many shortcomings on the core issue of graphic alignment.Internet search has become the main way for people to obtain news today.Different from other styles,news has very important social significance,and its expression is mostly a combination of graphics and text.In this regard,this paper improves the existing cross-modal image-text retrieval model for the scene of news event graphic search,combining the following work to construct a multi-modal news data set of images and text along with propose the model for graphic matching——MSAVT(Multi-level Semantic Alignments for Visual and Text),designed and implemented a cross-modal image-text search system for news events to meet current retrieval needs.This paper completed the following three parts of work:1.Aiming at the problem that there is no public multi-modal news data set of images and text,this paper establishes a single-modal semantic annotation model based on news event classification and generates a multi-modal news graphic data set of 5153 pairs of image and text from 250 news events.2.Aiming at the problem that the alignment accuracy of the existing cross-modal image-text retrieval model still needs to be improved,that is,there is still a large room for improvement in the relevant evaluation indicators,this paper makes improvements to the existing model.One is to propose the cluster loss that simultaneously establishes intra-modal constraints and inter-modal constraints,and the other is to add a word detection module to the existing model to focus on word-level alignment.In addition,we introduce a pre-trained BERT model to model text,which improves the generalization performance of the algorithm.3.Taking the MSAVT model proposed in this paper as the core,a search system is designed and implemented by using front-end and back-end programming technology such as Vue and SpringBoot.Compared with the single-modal search system,it obtains richer search results,thus verifying the effectiveness of the model,and reflects its practical application value.This paper uses the data set obtained in 1 to carry out experiments and designs corresponding comparative experiments for the research work of 2.We verify the effectiveness of optimization on the algorithm based on evaluation indicators such as mAP(mean average precision)and Recall@N(recall rate of Top N returned results).
Keywords/Search Tags:cross-modal retrieval, news event, multi-modal data set of images and text, Multi-level semantic alignment
PDF Full Text Request
Related items