As one of the key technologies in the field of computer vision,pedestrian multi-target detection and tracking is widely used in the fields of autonomous driving,intelligent surveillance,and behavior recognition.Relying on the iteration of deep learning technology,pedestrian multi-target detection and tracking algorithms based on deep learning have higher accuracy compared with traditional methods and show more prospects for development.However,there are still many challenges in the practical application of this task,such as false detection,missed detection and target loss caused by the complex background environment,dense crowded pedestrians,obstructed pedestrians and pedestrian pose changes,as well as real-time detection jams caused by complex models in end-side deployment.In this paper,we use Res Net and Fair MOT as the base models and propose improvement methods with the above problems as the entry point.The main research contents and contributions are as follows:(1)To address the problems of false detection,missed detection,and target loss in low-illumination scenes,two aspects of global attention and low-illumination are investigated.First,we study the construction of global information features from both spatial and channel perspectives within the CNN receptive domain,and strengthen the representation capability of CNN by improving the information encoding quality of CNN globally.The focus is on global relations and a new architectural unit,called GA(Global Attention)module,is proposed to adaptively give the main features for distinguishing foreground and background in the graph by explicitly modeling the interdependencies between space and channels.Secondly,restricted contrast adaptive histogram equalization is introduced in the inference stage for data in low-illumination scenes,which is used to reduce image noise and enhance image detail information in low-illumination scenes,and is adapted to the GA module to improve the tracking performance of the model in low-illumination scenes.(2)We deepen the study of model misdetection,missed detection,and target loss,and integrate the advantages of both CNN and Transformer in an efficient way.First,the global information dependence problem is solved by using the multi-headed self-attention mechanism in Transformer to reduce the dependence on external information and calculate the mutual influence between pixels or features.Second,in response to the high computational complexity of Transformer,we introduce deep separable deconvolution to improve the computational efficiency and propose a new architectural unit called GCT(Global CNN Transformer)module.Then,the dense information exchange between different feature layers of FPN(Feature Pyramid Network)is enhanced so that the detector processes high-level semantic information and low-level spatial information with the same priority in the pre-stage of the network,and a cross-layer connection operation is introduced and named as CL-FPN(Cross Layer-FPN)..(3)The model compression is investigated to optimize the pedestrian multiobjective tracking model by using three model compression methods: pruning,quantization,and parameter reconstruction,in order to address the situation of real-time detection jams in model end-side deployment.This study focuses on model weight processing.First,the model is pruned by reducing the model parameters according to the weight criteria.Second,the model weights of 32-bit floating-point numbers are mapped to 8-bit integers,and the dense weights are discretized using the K-Means++ clustering algorithm to achieve model quantization.Then,the model parameters are reconstructed to operate by combining the operations of convolution and pooling to reduce the number of model calculations and parameters and improve the operational efficiency by operator fusion without decreasing or slightly decreasing the model accuracy.The above strategies are applied to the Res Net and Fair MOT models and compared with the original models.In this paper,the CIFAR-10 dataset is used to verify the effectiveness of the strategy in the upstream classification task of the Res Net model,and the MOT17 dataset is used to verify the effectiveness of the strategy in the multi-target detection task of parading people under the Fair MOT model.The experiments show that the above strategies effectively improve the pedestrian multi-target detection and tracking accuracy of Fair MOT model in complex scenes,and effectively reduce the problems of model misdetection,missed detection and target loss,as shown in the improvement of IDF1 index.Finally,the real-time detection and tracking performance of the multi-target tracking model is improved by model compression. |