| Around the Spring Festival in 2020,the COVID-19 will spread all over the world,and the number of infected people will soar,which is a huge test for epidemic prevention.One of the effective measures is to prevent crowd gathering.However,it is a tedious job to count the number of people in a certain area manually.The rise of in-depth learning makes it convenient and fast.This paper designs a novel and practical neural network model,The number of people close to the real value can be obtained by inputting a static image,which can be used as a basis for early warning against the sparsity or density of an area.In addition,crowd counting plays an important role in traffic control and security measures.However,since the birth of crowd counting research,there have been many difficulties: the real scene of crowd counting task is complex and chaotic,and there are a lot of background noise,which can be easily confused with the head and may be mistaken for each other;The location of the shooting device is fixed,and the phenomenon of near large and far small may occur during the shooting process.The head near the shooting device is larger than the head is smaller,so it needs to meet the requirements of identifying multi-scale targets.In this paper,a crowd counting method with good accuracy and robustness is designed to reduce the impact of perspective distortion and background noise.Specific contributions are as follows:For the interference caused by background noise,this paper designs the attention map generation module,which embeds the context information,so that the model can fully learn the overall relevance of the image and understand the large-scale scale changes.On this basis,the foreground(crowd area)and background of the image are segmented to generate a mask image.In addition,the middle feature adds spatial attention mechanism to suppress the image background;Channel attention mechanism is added to suppress invalid features.For the multi-scale problem caused by perspective distortion,this paper designs a multi-scale feature fusion module,which includes a densely connected cavity convolution,reasonably sets the expansion rate of each cavity convolution,and obtains a large range of receptive fields with continuous scales through multiple branches,so as to capture head features of different sizes;The deformable convolution mechanism is added to each cavity convolution,which breaks through the restriction that the standard convolution sampling shape can only be rectangular,and enables the model to extract features that are close to the edge of the human head.In addition,in the process of high and low layer feature fusion,the high layer feature map does not conduct bilinear interpolation and up-sampling,but is similar to the deformable convolution principle.During the image up-sampling process,each sampling point learns an offset,so that the high layer feature map will not have an offset when it is fused with the low layer feature map after up-sampling,and feature alignment is achieved.The population counting method proposed in this paper is used in the population counting dataset Shanghai Tech,UCF_CC_50.A lot of experiments have been carried out on Mall,and its accuracy and robustness have been improved to varying degrees compared with some advanced crowd counting algorithms.Especially on Shanghai Tech Part-A,the average absolute error and mean square error have both decreased by more than 7%.The experimental results show that the crowd counting method proposed in this paper is more effective and practical than some mainstream crowd counting methods. |