| Object detection,as an important subtask in the field of computer vision,has broad applications in areas such as autonomous driving,face detection,traffic monitoring,and medical image processing.With the rapid development of convolutional neural networks(CNNs),object detection has shifted from traditional handcrafted detection methods to deep learning-based methods.Numerous object detection models based on deep learning and CNNs have emerged,constantly pushing the detection accuracy on various object detection datasets.The Feature Pyramid Network(FPN)structure is a network architecture used to optimize the multi-scale detection and feature extraction process.It has been widely used in both single-stage and two-stage object detection models based on CNNs.However,the FPN structure still has certain design flaws that affect the final detection accuracy.This paper summarizes a series of problems in the FPN structure and proposes optimization models to address these issues.The main research contents are as follows:In response to the issues of information loss and imbalanced receptive field coverage in the highest-level feature maps of the FPN structure,Chapter 3 proposes the improvement module of the Multi-Receptive Field(MRF)based on feature fusion.The RFE module fully extracts and retains information from the highest layer,mitigating the impact of significant dimension changes.By concatenating feature maps with multiple receptive fields from the highest-level of the feature pyramid,different receptive fields are integrated to enhance the detection capability for objects of different scales,further improving the model’s multi-scale generalization ability.Experimental results demonstrate that the MRF module effectively enhances the detection accuracy.To address the insufficient fusion of features at different levels and the weakening of features for small objects in the low-level feature maps of the FPN structure,Chapter 4proposes an improvement scheme for the feature pyramid model based on feature fusion and low-level feature enhancement,including the CIF and ESF modules.Effective feature information is extracted through stacked convolutional layers and channel attention mechanisms,coupled with channel attention weight learning to enhance feature extraction.The bottom-up dimension concatenation method and the reduction module for the mixing effect are employed to alleviate the mixing effect and strengthen the fusion of features at different levels.Experimental results demonstrate that the proposed methods effectively improve the detection accuracy.In Chapter 5,a new feature pyramid structure called Feature-Integration Feature Pyramid Network(FI-FPN)is proposed,which integrates the MRF,CIF,and ESF modules proposed to address a series of problems in the feature pyramid.Experimental results on the COCO dataset for object detection tasks show that this model,under the Res Net50 backbone network,significantly improves the performance of Faster R-CNN and Retina Net,increasing the mean Average Precision(m AP)by 3.5 and 4.6,respectively.Compared to other improvement methods based on the FPN structure,the proposed approach in this paper enhances the model’s multi-scale generalization ability,alleviates the mixing effect,and strengthens the fusion of features at different levels.Experimental results demonstrate that the proposed approach effectively improves the detection accuracy. |