Font Size: a A A

Research On Object Detection Method Based On Deep Learning And Relational Reasoning

Posted on:2024-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z H LiFull Text:PDF
GTID:2568307157971429Subject:Electronic information technology
Abstract/Summary:PDF Full Text Request
With the continuous development of technology,computer vision technology has gradually penetrated into the daily life of the general public.As a hot research direction,target detection technology has been widely used in face detection,automatic driving,industrial production,aerospace and other fields.Target detection mainly classifies several different objects contained in a picture and gives its corresponding bounding box.At the same time,many other tasks in computer vision depend on the output of target detection algorithms,so it is necessary to improve the accuracy of target detection algorithms.However,existing target detection algorithms are often limited to processing each target region separately and usually lack the ability to reason using the relationships between targets.While targets in images usually contain rich relational information,ignoring such information will affect the accuracy and efficiency of target detection,making target detection methods inherently limited.Therefore,in order to improve the accuracy of target detection,this thesis proposes a multimodal relational inference target detection method based on Transformer and graph convolutional network and a relational fusion target detection method based on attention mechanism and similarity matching,inspired by human recognition and inference process.The main research contents of this thesis are as follows:(1)To address the characteristic that the targets to be detected by current target detection algorithms are often strongly related to each other and they are not absolutely segmented from the contextual environment,this thesis proposes a multimodal relational inference target detection method based on Transformer and graph convolutional networks.The method firstly,by introducing textual modal relations of image description algorithms,and then utilizing the implicit associative auxiliary information in images to assist the target detection task,by modeling the relationships between targets.Secondly,the features of visual modality are enhanced and enriched by drawing on an NLP model based on a multi-headed attention mechanism.Finally,the features with enhanced target information are passed into the classification and regression sub-network for training.By this way,the target detection algorithm can not only correct the original wrong and missed targets,but also realize the accurate recognition of some small target objects.(2)To address the problem that multimodal relational inference networks do not introduce human recognition reasoning,this thesis proposes a relational fusion target detection method based on attention mechanism and similarity matching on top of the above method.Firstly,the contextual a priori information is obtained by constructing a knowledge graph,through which the human brain’s storage of experience and knowledge is simulated.Second,the knowledge graph is optimized to reduce the redundant edges in order to address the problem that the knowledge graph is relatively large.Finally,a similarity matching module between the knowledge graph and the region of interest is introduced to implement human visual inference and further enhance the feature representation,which is passed into the detection head network for training.In this way,the performance of the target detector is enhanced by exploiting the role of supervision and bias correction of prior knowledge information.In this thesis,experimental comparisons and conclusion analysis of the proposed method are conducted using MS COCO dataset and PASCAL VOC dataset.The experiments show that both target detection methods proposed in this thesis improve in detection accuracy,and the detection accuracy of the improved algorithm is improved by 0.7% and 1.2%,respectively,compared to the Faster R-CNN network.
Keywords/Search Tags:object detection, relational reasoning, multimodal relations, graph convolutional networks, attention mechanism, relational fusion
PDF Full Text Request
Related items