| As a hot research subject in computer vision and pattern recognition tasks,crowd counting and density map estimation are widely applied now and in the future,such as intelligent surveillance,traffic management and public safety.In recent years,a great deal of innovation and research has been done by researchers in the field of crowd counting,but it still faces many challenges,such as severe occlusion,non-uniform density and extreme congestion.Therefore,in order to improve the current situation of low accuracy of crowd counting,a robust deep learning-based crowd counting method is proposed.Focus on the issue that different field depth and occlusion interference affect the accuracy of crowd counting,firstly,according to the characteristics of LeNet-5,AlexNet and VGG-16 models that objects of different field depth are extracted from images,the convolution kernel size and network structure of the three classical models are adjusted,and perform local perception on receptive fields of different sizes respectively,and head features of different scales in the image were extracted to improve the counting accuracy.Then,a deep convolutional neural network architecture based on multi-model integration is constructed,and the convolutional layer with a filter size of 1×1 is configured at the back-end of the network to replace the traditional fully connected layer,and the extracted feature map is linearly weighted,the accuracy and efficiency of the crowd counting algorithm are considered.Finally,the proposed network model outputs the estimated density map and the predicted number of people.A large number of experiments show that the crowd counting method proposed in this paper has better performance than the existing traditional methods on the public crowd counting datasets.Meanwhile,the proposed crowd counting model is proved to have good generalization ability through the transfer learning experiment.Recently,novel crowd counting methods emerge in endlessly,but they can't deal with scale change perfectly.In order to further improve the performance of crowd counting,inspired by the significant improvement of the object detection task by the Receptive Field Block(RFB),weintegrated VGGNet model,Receptive Field block model and dilated convolution model to study the crowd counting algorithm,which can better simulate the receptive field in the real human visual system.Then a multi-scale network based on dilated convolution is proposed for crowd counting in the scene,and accurate and fast crowd counting is implemented by learning the multi-scale context information in the image.The front-end of the proposed network is the first ten layers of VGG-16,then the receptive field block is embedded in the network to extract multi-scale features,and finally a series of dilated convolutional layers are selected as the network back-end.Most of the previous methods used random segmentation of the image into patches to amplify the training sample,but the sample area of the patch was repeated and the global information was partially lost.In the phase of network reinforcement training,a complete image input model is used to train,and the complete semantic information and spatial features in the image are learned.A large number of experiments have been conducted on the common baseline crowd counting image sets,and the results show that the proposed crowd counting method is superior to the most advanced method in performance,and the comparative experiments and ablation studies were conducted to verify the generalization ability of the proposed model. |