| Pedestrian detection is a hot topic in computer vision,which aims to utilize computer vision technology to identify and locate pedestrians accurately in images or videos.This technology can be used as the basis of many visual tasks such as person re-identification,human pose estimation and behavior analysis,and can also be applied to many industrial fields such as intelligent security,vehicle assisted driving system and intelligent robot.Pedestrian detection plays a critical role in both related academic research and industrial applications.With the rapid development of deep learning,pedestrian detection research has made many achievements and breakthroughs recently,and a series of methods based on deep convolutional neural network have become the mainstream.Currently,general object detection can achieve the expected results in standard pedestrian detection,but pedestrian detection in crowded scenes presents many challenges.When the pedestrian density increases,so does the occlusion between pedestrians.It is difficult for conventional detectors to effectively accomplish the detection of occluded pedestrians in the current environment.What’s more,pedestrians have the characteristics of rigid and flexible objects,and there are different degrees of differences in scale,dress and posture,all of which may undoubtedly improve the difficulty of pedestrian detection in crowded scenes.In view of the above limitations,this thesis takes the pedestrian detection model based on deep learning as the basis,the specific implementation of the network is improved to effectively deal with the pedestrian detection in crowded scenes,and the proposed module is introduced to optimize the feature extraction of pedestrian detection network.The main research work of this thesis includes the following three aspects:(1)For the pedestrian occlusion problem in crowded scenes,most pedestrian detection methods tend to use additional annotation information in some datasets(such as head annotations or visible part annotations)or utilize attention mechanisms to emphasize the feature information of pedestrian visible part.However,few methods try to use feature context information to enhance the features extracted by the backbone network to cope with the lack of complete information caused by occlusion.Therefore,a global context-aware module is proposed,which can effectively combine the feature context and obtain more accurate occluded pedestrian features through the guidance of the global context.In addition,the prominent upper body features of occluded pedestrians are enhanced,so that the discriminant features are more used in the final decision-making.The proposed method can effectively improve the performance of pedestrian detection in crowded scenes and reduce many missed or false detections caused by occlusion.(2)For the multi-scale problem of pedestrian targets in crowded scenes,this thesis improves the feature pyramid network.The features with high-level semantic information in deep layer and high-resolution features in shallow layer of convolutional neural network are fused.The shallow features of the network are fused into the deep ones from bottom to top by weighted fusion,so that the network can allocate different weight information to the fused features.Fully fused multi-layer feature outputs can respond more effectively to target scale variations.Considering that the absence of complete feature information may prevent the detector from making accurate localization of the target.A module that can fuse spatial information is proposed,and certain spatial feature information is applied to assist the network to obtain potential pedestrian features and compensate for the salient feature information lost due to occlusion.(3)The two-stage pedestrian detection method has certain advantages over the one-stage method in terms of detection accuracy,but the network model is generally too large.To effectively reduce the size of the model and improve the detection efficiency,the lightweight network Mobile Net is used as the backbone network.For the shake of enhancing the ability of extracting pedestrian features in crowded scenes,a feature fusion based on Transformer encoder is added to the backbone network structure,which makes the multi-layer feature output of the network more robust.To solve the problem of overlapping pedestrians in crowded scenes,this thesis exploits multi-stage regression to gradually combine the existing prediction information to regress the pedestrian position more accurately.Based on the above three research works,the thesis carried out validation experiments on the challenging Crowd Human dataset and City Persons dataset.The effectiveness of the proposed methods is verified by adequate experimental comparison and metrics evaluation. |