| Object detection is an important branch in the field of computer vision,and with the rapid development of deep learning and improvement of related algorithms in recent years,it has been widely used in the fields of intelligent surveillance,image retrieval,and autonomous driving.However,existing object detection algorithms require instance-level labels with high accuracy to train the detection models,and these data labels require a lot of human and material resources to perform detailed location annotation of object in images,which limits the large-scale implementation of the corresponding algorithms.The weakly supervised object detection model uses only image-level labels as supervised information when performing training,which greatly reduces the difficulty of obtaining data labels.The main process of the weakly supervised object detection algorithm is to first generate proposal suggestions for the images,and then use the features of these proposals for multiple example learning to achieve object classification and localization.However,the inconsistency between the training target and the supervised information used for training leads to the following two prominent problems with existing weakly supervised object detection algorithms:(1)The model focuses its attention on the most discriminative part of the object during detection and ignores the less discriminative part,resulting in the model obtaining unreasonable proposal scores;(2)Most of the initial proposals generated by existing methods cannot compactly wrap the object and do not fit the ground-truth of the object well,making it difficult for the detector to accurately locate the object.To address the first problem,this thesis proposes a weakly supervised object detection framework based on extended proposal contrast enhancement that refines the initial proposal scores using additional contrast integrity semantics.First,the method performs an extension operation on the initial region proposal in the same way that humans observe comparisons in real life to obtain extended region proposals in each direction.Subsequently,a recurrent neural network is used to construct an encoder that converts the spatial features of the region proposals into sequential features,while a decoder with a miniature dual-stream branching structure is constructed to constrain the encoding space of the encoder.Finally,the contrast integrity semantics is obtained by extracting the subtle differences between the initial region proposal and the extended region proposal,which is used as an additional basis for evaluating the initial region proposal score.Experiments on the publicly available datasets PASCAL VOC 2007,VOC 2012,and MS COCO demonstrate the effectiveness of the method,and the detection performance of the algorithm has reached the level of advanced methods.To address the second problem,this thesis obtains a higher quality initial region proposal by refining each boundary of the initial region proposal using a cyclic expansion region proposal method.First,the method generates inward-tightened and outward-extended expansion region proposals on the initial region proposals,and extracts the sequence features of the initial and expansion region proposals using a special encoder-decoder structure.Subsequently,the directional expansion matrix and the cyclic expansion matrix are constructed based on the JS dispersion between the initial proposal and the extended proposal sequence features.The directional expansion matrix is used to control the expansion direction of the region proposal during the cyclic expansion,and the cyclic region expansion matrix is used to control the stopping timing of the region proposal during the cyclic process.Finally,the detector selects the appropriate proposal from the region proposals that complete the loop expansion as the detection box of the object.Experiments on publicly available datasets demonstrate the effectiveness of the method,and the detection performance of the algorithm has reached the level of advanced methods. |