Font Size: a A A

Research On Object Detection Algorithm Based On Deep Convolutional Neural Network

Posted on:2024-06-15Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2568307142952349Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the strong support of country for the field of artificial intelligence,object detection technology,as one of the most basic tasks in the field of computer vision,has been widely used in security monitoring,face recognition,unmanned driving,intelligent transportation,smart city and many other application scenarios.However,there are still pain points in multi-scale target detection,especially for small target detection.Therefore,this paper studies multi-scale target detection from the perspective of multi-scale feature fusion,and uses multi-scale feature fusion algorithm to alleviate the low accuracy of multi-scale target detection.On this basis,we also study how to improve the detection accuracy of small targets,and finally design a fine-grained attention mechanism to improve the detection of small targets by using context information.The main contents of this paper are summarized as follows:Firstly,aiming at the low accuracy of multi-scale target detection,this paper designs an algorithm based on semantic multi-scale feature fusion.Firstly,multi-scale convolution kernel is used to generate multi-scale feature information required by the target detection network.Then,the multi-scale feature fusion method is used to fuse these features,and the multi-scale information mixed with the superficial geometric information and the high-level semantic information is obtained.To enhance the features further,the SE(Squeeze and Excitation)attention mechanism is introduced to re-calibrate the fused multi-scale features by means of cross-channel weights,and effectively enhance the multi-scale information of the network.Therefore,our algorithm can effectively improve the precision of target detection,especially for small targets,and is more robust to targets with different scales.The feature fusion module can be embedded into any target detection network.The proposed method improved the detection accuracy on MS COCO 2017 test and Pascal VOC datasets by 0.8% and0.9% compared with baseline YOLOX,respectively.Secondly,the attention mechanism of C×W×H channel and space feature in the existing target detection network usually realizes the calculation and allocation of selfattention weight after information compression of all C channels and the whole space W×H.This coarse-grained global operation mode ignores the differences between different channels and different spatial regions,leading to a large error in the calculation of attention weight.In addition,how to mine the context information in space W×H is also a challenge for target recognition and location.In view of the above problems,This paper proposes a Fine Grained Dual Level attention Mechanism joint Spacial Context Information Fusion module(FGDLAM&SCIFF).FGDLAM&SCIFF uses a series structure to combine the channel and spatial attention,wherein the channel attention module divides the feature space W×H into n(n=4)subspace,and constructs a global adaptive pooling and one-dimensional convolution algorithm to extract the feature channel weight of each subspace effectively.In the spatial attention section,C feature channels are divided into R(R=4)groups.Then,a multi-scale module is constructed in the feature space W×H to mine context information,and the orthodontic fusion is further adopted by row and row coding to obtain enhanced features.This module is an embedded universal feature enhancement network,which is transplanted into the classical target detection networks of YOLOv4,YOLOv5,PPYOLOE and YOLOX,and tested on MS COCO 2017 and Pascal VOC 2007 datasets.The effectiveness and portability of FGDLAM&SCIFF attention strategy for target detection are verified.Among them,compared with baseline YOLOX,the accuracy on MS COCO 2017 test and Pascal VOC datasets was improved by 2.0% and1.7%,respectively.
Keywords/Search Tags:multi-scale feature fusion, fine-grained attention mechanism, spatial context information, object detection, multi-scale target detection
PDF Full Text Request
Related items