| Crowd counting is a task that estimates the counting results and density distribution of a crowd by extracting and analyzing the crowd features.A common strategy extracts the crowd features of different scales through multiscale convolutional neural networks(CNNs)and then fuses them to yield the final density estimation results.However,crowd information will be lost due to the downsampling operation in CNNs and the model averaging effects in the multiscale CNNs induced by the fusion method.The strategy does not necessarily acquire accurate estimation results.Accordingly,this study proposes a novel model named multiscale crowd counting via adversarial dilated convolutions.Our background modeling is based on a dilated convolution model proposed by solving the problem of image semantic segmentation.In the domain of image segmentation,the most common method is to use a CNN to solve the problem.We enter the images into the CNN,and the network performs the convolution operation to images and then the pooling operation.As a result,the image size is reduced,and the receptive field of the network is increased.However,the image segmentation is a pixel-wise problem,and the smaller image must be upsampled after pooling to the original image size for prediction(a deconvolution operation is generally used to realize upsampling).Therefore,two key points exist in image segmentation.One is that pooling reduces the image size and increases the receptive field,and the other is upsampling to enlarge the image size.Information may be lost in the proeesses of reducing and resizing.Dilated convolution,which can make the network scale consistent,is proposed to solve this issue.A specific principle is to remove the pooling layer in the network that reduces the resolution of the feature map.However,the model cannot learn the global vision of images.Increasing the convolution kernel scale will cause the computation to increase sharply and overload the memory.We can increase the original convolution kernel to a certain expansion coefficient and fill the empty position with 0 to enlarge the scale of the convolution kermel and increase the receptive field.In this way,the receptive field widens due to the expansion of the convolution kernel and the computation remains unchanged because the effective computation points in the convolution kernel remain unchanged.The scale of each feature is invariable,and thus the image information is also preserved.The proposed model is based on adversarial dilated convolutions.On the one hand,the dilated convolution can extract the features of input image without losing resolution and the module uses different dilated convolutions to aggregate multiscale context information.On the other hand,the adversarial loss function improves the accuracy of estimation results in a collaborating manner to fuse different-scale information.The proposed method reduces the mean absolute error(MAE)and the mean squared error(MSE)to 60.5 and 109.7 on the Part A of ShanghaiTech dataset and to 10.2 and 15.3 on the Part B,respectively.Compared with existing methods,the proposed method shows improved MAEs by 7.7 and 0.4 in the two parts.A synthetic analysis of five sets of video sequences on the WorldExpo'10 database demonstrates that the average prediction result increases by 0.66 compared with that of the classical algorithm.On the UCFCC50 dataset,MAE and MSE improve by 18.6 and 22.9,respectively,which proves that the estimation accuracy is enhanced because of the noticeable effect on the environment with a complex number of scenes.However,the MAE reduces to 1.02 on the UCSD database and the MSE does not improve.The adversarial loss function limits the robustness of crowd counting with a low-density environment.A new learning strategy named multiscale crowd counting via adversarial dilated convolutions is proposed in this study.The network uses the dilated convolutions to save significant image information,and the dilated convolutions with dilated coefficients of different sizes aggregate multiscale contextual information,which solves the problem of counting the head of different scales in crowd scene images due to angle difference.The adversarial loss function utilizes the image feature extracted by the network to estimate crowd density.Experimental results show that the algorithm model constructed in the scene with large population distribution has good adaptability.The model can estimate the density distribution according to different scenes and count the population accurately. |