| With the improvement of material living standards,people's social activities become more and more intensive,such as airports,stations,large commercial centers,etc.,which poses huge challenges to public management and public safety.Traditional video surveillance systems require personnel to be on duty for a long time,and it is difficult to timely report abnormal events.Therefore,designing an intelligent dense crowd monitoring system to analyze crowd density in real-time and dynamically,and to early-warn abnormal conditions,has important practical significance for managers to avoid the occurrence of vicious events in advance.In recent years,crowd counting algorithms based on machine vision and deep convolutional neural networks have become research hotspots.However,the currently developed crowd counting algorithms are all aimed at outdoor scenes,and the researches on crowd counting and density estimation in indoor scenes rely on face recognition or pedestrian recognition.Due to the various installation angles of indoor surveillance cameras,the indoor crowd is unevenly distributed,and the limitations of face recognition and pedestrian recognition are very large.Therefore,this paper adopts deep convolutional neural network technology and uses the idea of head detection to break through the limitations of other detection methods in indoor scenes.First realize the indoor crowd detection,and then calculate the number of people and the regression population density map based on the detection results.However,the following challenges still exist in indoor head detection,so this article proposes different solutions:(1)Due to the problem of camera installation angle,the indoor target distribution of human heads is unevenly distributed,especially away from the camera,and the scale is small.At the same time,due to the dense crowd,it is easy to cause a high rate of missed models.Therefore,this paper uses dilate convolution to construct a multi-scale feature extraction module for extracting multi-scale information;according to the feature propagation characteristics of different scales,a differentiated feature fusion module is designed to fuse feature layers of different scales.Finally,the above modules are combined to implement a multi-scale head detection network.(2)Due to the high degree of individual freedom,the population distribution is disordered and the density varies widely.Networks using a fixed convolution kernel will cause errors due to uneven sample feature distribution.Therefore,in this paper,the spatial attention module is used to extract global information,and a layer attention module is designed to fuse global and local information to extract target distribution information,thereby achieving crowd detection and counting.(3)The background of the indoor scene is complicated,and low-level features such as the color and shape of the human head easily overlap with other objects in the background,leading to an increase in the misrecognition rate of crowd detection.This paper first divides background feature interference into two categories based on analysis.One is the characteristic interference of other objects outside the human body.We construct a hybrid attention module to guide the network to enhance the attention of the target area;the other is the human head itself.We have constructed a central receptive field module to simulate the human visual receptive field to extract key features of the target.The experimental results show that compared with similar algorithms,the method proposed in this paper achieves a significant improvement in recall rate,counting accuracy and counting indicators.Different design ideas have been added for the research of deep convolutional networks in the field of intelligent monitoring. |