| In recent years,safety accidents caused by unexpected large-scale crowd gatherings have occurred frequently.Crowd counting is one of the significant reference standards for people flow management.It uses computer vision technology to timely obtain accurate crowd numbers and crowd distribution from images or videos and plays an important role in security early warning,urban planning,public management,and other fields.With the rapid development of deep learning technology,the crowd counting method based on a convolutional neural network has achieved good performance.However,given the non-uniform distribution of people and the drastic change of pedestrian scale in complex scenes,the existing crowd counting methods still face great challenges in performance.To solve the above problems and further improve the counting performance,this thesis conducts in-depth research on data enhancement,loss optimization,and algorithm structure and builds solvable solutions.The main contents of this thesis are as follows:(1)The Spatial Density Adaptive Data Augmentation(SDA_DA)is proposed to balance the density difference between training samples.SDA_DA performs three scale factor scaling,horizontal flipping,and random cropping operations on the training images.Then it carries out density statistics on the randomly cropped image blocks and probabilistically retains image blocks at different density levels.The experimental results show that this method increases the effectiveness of samples and improves the counting performance of the network.(2)The Pooling Loss(PLoss)of local density normalization is proposed to reduces the loss difference between different density distribution patterns.Based on the loss calculation of local density normalization,the PLoss divides the density map into subregions by pooling operation,then calculates the relative prediction error of each subregion,and finally sums all the sub-region errors to obtain the final prediction loss value.The experimental results show that this method improves the accuracy and generalization ability of the counting model.(3)The Multi-scale Feature Aggregation Network(MSFANet)is proposed to identify crowd objects of different scales in complex scenes.The network establishes short-connection feature aggregation modules between adjacent convolution blocks to aggregate the receptive field information of adjacent sizes.The long-connection feature aggregation modules are established between the low-level convolution block close to the network input and the high-level network to aggregate the spatial detail information of the low-level and the semantic information of the high-level by the multiple reuses of the low-level features.The feature combination unit is designed to further enhance the expression of low-level global features.Tests on four challenging datasets validate the effectiveness of the method. |