Font Size: a A A

Research On Multimodal Information Oriented Relationship Detection

Posted on:2023-08-14Degree:MasterType:Thesis
Country:ChinaCandidate:J L LouFull Text:PDF
GTID:2568306836468554Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,people’s daily life has been closely related to the Internet.Not only a large amount of data are generated every day,but also various modes of data.The explosive growth of multimodal data such as text and image makes it difficult for users to effectively obtain valuable information from big data.Therefore,it is of great significance to study how to extract information effectively from a large number of multimodal data and describe the correlation between multimodal information.Traditional information extraction usually extracts structured,semi-structured and unstructured data from unstructured initial text,and then stores these information in structured database to facilitate users’ use.This method has long been unable to meet the needs of massive multimodal analysis.In recent years,the research of deep learning in computer vision and natural language processing has made a breakthrough,which promotes the rapid development of object relationship detection based on multimodal information.Therefore,by studying the relationship detection between objects based on multimodal information,this paper provides a theoretical basis for the construction of cross modal knowledge graph.Specifically,this paper first designs a visual relationship detection model based on relational triples;Based on the above research,a text description assisted cross modal relationship detection model is proposed;Finally,the cross modal relationship detection model is used to build the cross modal knowledge graph.The main innovations of this paper are as follows:(1)This paper proposes a visual relationship detection model based on object detection and multi feature fusion.The proposed model constructs the visual module,semantic module and loss calculation into an end-to-end multi branch parallel cooperation network.The vision module obtains visual features and predicts object categories under the help of object detection;The semantic module extracts the object semantic features by using the external semantic database;In the loss calculation module,the paper combines the softmax triple loss based on semantic representation and the triple loss based on visual features to guide the visual and semantic modules to interact with each other.Experiments on the public multimodal visual genome dataset verify the advantages of the proposed network.(2)Considering that the entity and the relationship obtained by single-mode information extraction has the shortcomings of high polysemy and weak expression,this paper further constructs a cross modal relationship detection network architecture based on the visual relationship detection model.The constructed cross modal relationship detection network introduces the text representation branch,dynamically integrates the text embedding and visual embedding,and guides the text and visual information to maximize their commonness by designing the text visual interaction loss function,which can effectively improve the robustness of the relationship detection model in complex scenes.(3)This paper proposes a new paradigm for the construction of cross modal knowledge graph.By combining cross modal relationship detection and cross modal knowledge representation,the cross modal knowledge graph(CMKG)is constructed.Specifically,in the process of joint cross modal relationship detection and cross modal knowledge representation,this paper focuses on how to obtain accurate cross modal knowledge triples without additional auxiliary tools.The CMKG is designed to provide a more reliable multi-modal knowledge storage and representation for multi-modal knowledge retrieval,which is different from the traditional single modal knowledge graph.
Keywords/Search Tags:Multi-modal, visual relationship detection, deep learning, knowledge graph, cross modal knowledge graph
PDF Full Text Request
Related items