| With the rapid development of China’s economy and the acceleration of the process of urbanization,the phenomenon of crowd gathering is becoming more and more common,which brings certain adverse effects on social governance.Crowd counting can build highlevel cognition such as scene understanding and crowd analysis through crowd number,which is of great value and significance for resource control and safety prevention and control.In recent years,the crowd counting method based on convolutional neural network has gained wide attention and achieved good counting performance.However,due to the dramatic changes in crowd scale and complex background interference problems still exist,the existing crowd counting methods are still faced with many challenges.In view of the above problems,this paper conducts an in-depth study on how to improve the accuracy of crowd counting methods based on the existing knowledge of crowd counting methods and convolutional neural networks.Firstly,in order to improve the ability of neural network to extract crowd size and key information in images,a multi-scale crowd counting method integrating spatial channel attention is proposed.Based on the feature extraction function of attention mechanism,a spatial-channel attention module is designed.After analyzing the characteristics of dilated convolution,a continuous dilated convolution module is constructed,which is combined with the space channel attention module to form the global space channel attention module.In order to improve the multi-scale sensing capability of the network,the features extracted from different blocks of the basic network are fused through the multi-scale feature fusion structure.This method can produce a high quality forecast density map and improve the count accuracy.Secondly,in order to enhance the network’s ability to perceive key features,a multiscale crowd counting method integrating self-attention is proposed.In order to enhance the feature extraction ability of the basic network,a spatially separable self-attention module is added to the basic network.After analyzing the advantages of multi-scale feature fusion,a multi-scale dilated convolution module is proposed.In order to enhance the multi-scale perception of the single branch network,the above two modules are combined and applied to the last three convolutional blocks of the basic network.These branch network features are fused with multi-scale features and back-end network structure to generate multi-level and multi-scale predicted density map,so as to complete more accurate population counting task.Finally,the proposed method is compared and analyzed with the existing crowd counting methods,and ablation experiments are carried out to verify the effectiveness and feasibility of the proposed method. |