| Frequent production accidents in chemical enterprises cause a large number of casualties,so how to effectively ensure the safety of chemical enterprises personnel has become an urgent problem to be solved[78].At present,although monitoring equipment in chemical production area has been very popular,which can realize real-time monitoring on site,there is still a lack of effective measures for the management and control of operators gathering and accident prevention.Crowd counting is one of the important contents of intelligent monitoring system,whose task is to accurately estimate the total number of people in the picture and give the distribution of crowd density[2],which is of great significance for the security early warning and scheduling planning of chemical enterprises.In recent years,with the rapid development of deep learning technology,many research results related to crowd counting have been proposed,and good counting performance has been achieved in some specific scenarios.However,this task still faces many serious challenges,such as the huge change of population size,the interference of complex background,and the simple count can not meet the actual demand.In order to solve the above problems and further improve the counting accuracy,this paper carries out research from the aspects of network structure and loss function and puts forward specific schemes.The main research work and innovation of this paper are as follows:1.To solve the problem of mesoscale variation and background interference in images,we present a novel crowd counting method,called the Scale and Background aware Asymmetric Bilateral Network(SBAB-Net),which is able to handle scale variation and background noise in a unified framework.Specifically,the proposed SBAB-Net contains three main components,a pre-trained backbone convolutional neural network(CNN)as the feature extractor and two asymmetric branches to generate a density map.These two asymmetric branches have different structures and use features from different semantic layers.One branch is densely connected stacked dilated convolution(DCSDC)sub-network with different dilation rates,which relies on one deep feature layer and can handle scale variation.The other branch is parameter-free densely connected stacked pooling(DCSP)sub-network with various pooling kernels and strides,which relies on shallow feature and can fuse features with several receptive fields to reduce the impact of background noise.Two sub-networks are fused by the attention mechanism to generate the final density map.2.In view of the problem that simple counting cannot meet the actual needs,a pure crowd counting and positioning network based on point matching is proposed.That is,it is not just calculating the absolute count error at the image level,but also includes a more fine-grained estimate(i.e.the position of the individual).The network directly uses point annotation as the learning target and predicts the head coordinates and confidence of a set of points.The evaluation measure uses density-normalized average accuracy to provide a more comprehensive and accurate assessment of the network’s crowd location and counting performance.One-to-one point matching is used in the process of associating ground real targets with predicted points,which is beneficial to the improvement of density normalized average accuracy.This simple,intuitive and efficient design results in state-of-the-art counting performance and promising positioning accuracy. |