| With a population of about 1,4 billion,China is the most populous.Hence,there are many occasions where large-scale crowds gather,such as New Year’s Eve activities,religious activities,tourism activities in hot spots,etc.In these occasions where a large number of people gather,stampede accidents occur frequently,resulting in unpredictable casualties and property losses.These sudden events have posed a serious threat to social and public safety.Hence,the study of crowd counting is significant.In recent years,methods based on deep learning have become the mainstream in crowd counting.The deep convolutional neural network can predict the density map,it can reflect the detailed information of crowd distribution,and also reflect the total number of crowds in the scene.However,due to complex scenes,scale variations,hard examples of the crowd and high annotation costs,the robustness of existing models still needs to be improved in the application process of real scenes.To solve these problems,improve the robustness of model,and better complete the task of crowd counting,the dissertation completes the following work.1.Due to the influence of complex backgrounds,models easily mistake the background as the crowd,which leads to a large counting error.To suppress the interference of complex backgrounds,we propose two sets of schemes.In the first set of schemes,a segmentation attention mechanism is proposed.Our model can adaptively highlight human head regions and suppress non-head regions.The crowd region can be segmented out through a binary segmentation task,and the background region is suppressed.Meanwhile,the model can also predict a rough density map with relay supervision.Then,the segmentation map and the density map are added correspondingly.Guided by the segmentation attention mechanism,the model pays more attention to the human head region and automatically encodes a high-precision density map.In addition,different datasets have different distributions of crowd counts.It can also automatically adapt to them by the classification task.In the second set of schemes,our proposed model can adaptively estimate the probability that each pixel belongs to the human head to avoid huge misjudgments in crowd counting.Specifically,the model can predict the confidence map and the rough estimated density map at the same time.Then the confidence map and the rough estimated density map are multiplied to suppress the influence of the background.Additionally,a novel classification component is designed that can accept inputs of arbitrary size to train the crowd-count classification task to explicitly map the category prior back to the model.Thereby,a high-precision density map with robust population distribution is automatically encoded.Multiple common datasets are used to validate the effectiveness of our proposed schemes.However,models struggle to detect difficult samples in the crowd.In the following work,a hard example focusing algorithm is proposed to solve this problem.2.The problem of hard examples in crowd counting is discovered.In the crowd counting of regression task,the hard example focusing algorithm is proposed to solve this problem.In principle,the hard example focusing algorithm reduces the weight of easy samples,which in turn increases the model’s attention to hard examples;For the scale variation,a novel multi-scale semantic refining strategy is proposed.The model first predicts segmentation maps at different scales,then these segmentation maps with semantic priors are mapped back to the network to adaptively extract and refine multiscale features.This strategy breaks the limits of crowd counting based on deep learning and enables lower-layer convolutional networks to capture the semantic concepts of crowd as well.Experimental results show that our model can focus on the hard examples well.And it has strong robustness to scale variation in the crowd.3.Most of the current crowd counting methods based on deep learning struggle to work well in unseen scenes.To improve the generalization ability of model and enable it to adapt to any monitoring scene,a few-shot crowd counting model is proposed,which only needs to give an annotated image(support image)from the target scene.Our model can adapt to the target scene and accurately predict the crowd counting in the scene.In order to fully extract the features from the limited support images,multi prototypes of foreground and density for the support image(SFD-MP)are proposed.And the EM algorithm is used to optimize them.Additionally,to sufficiently guide the model’s predictions on query images,CNN and SFD-MP are used to guide the model locally,and transformer and SFD-MP are used to guide the model globally.The effectiveness of the model is validated on multiple video surveillance datasets.4.Above methods are all based on the fully supervised paradigm to train model.And they require a large number of high-cost dense annotations.To reduce the annotation cost of crowd data,a multi-task semi-supervised crowd counting model is proposed.Our models include classification task,density regression task,and segmentation task.In order to utilize the massive unlabeled data to train the model,corresponding pseudolabels for each task is proposed.In addition,in order to suppress some unlabeled data with poor predictions misleading the model,the relationship between classification tasks and density regression tasks is explored to propose the global self-correction strategy.Using the density regression task and the segmentation task,a local self-correction strategy is proposed.Experimental results on multiple benchmark datasets show that our model can accurately predict crowd counts using a large amount of unlabeled data. |