| In recent years,with the rapid development of deep learning technology,artificial intelligence-related concepts or applications have been proposed,such as smart cities,smart security,autonomous driving,etc.,and real-time pedestrian tracking plays an important role in it.The pedestrian tracking task is one of the important sub-tasks of multi-target tracking.The algorithm of this task usually divides the tracking process into two stages,namely the detection and matching stages.The purpose of the detection stage is to mark the pedestrian targets contained in the image with the detection frame,so as to locate the pedestrian position,and the matching part is responsible for associating the pedestrian targets in the previous and previous frames,so that the same pedestrian is given the same identity,and different pedestrians have different identities.Therefore,based on the classic Deep SORT tracking algorithm,this paper improves the tracking algorithm and improves its accuracy and speed by improving the YOLOv5 algorithm in the detection part and the re-identification algorithm in the matching part.In terms of detection,the paper comprehensively considered the problems of model parameters and prediction accuracy,and decided to use the single-stage detection algorithm YOLOv5 for pedestrian target detection,and made three improvements to the algorithm: first,the network structure part,the paper added on the basis of YOLOv5 A small target detection head is used to reuse the underlying features of the algorithm backbone network to improve the small target detection capability of the algorithm.Second,in terms of feature fusion,the paper uses the Swin-Transformer structure to fuse the extracted features and uses the attention network.The advantages of extracting global features and fusing contextual features improve the multi-scale feature fusion ability of the algorithm.Finally,the paper improves the CIo U part of the loss function and introduces Box-Cox generalized power transformation,which improves the gradient and loss of high Io U targets.,so as to improve the detection accuracy of the algorithm.For the re-identification algorithm,the paper replaces the traditional residual network with OSNet,and improves the OSNet structure.The main purpose is to add an instance normalization(IN)layer to OSNet to weaken the influence of image style on feature extraction;after that,the paper constructs a neural structure search space,and obtains the optimal IN layer by automatically searching through the neural network framework.Insert the position to improve the cross-domain generalization performance of the algorithm.Finally,for the tracking algorithm,the paper combines the above improved YOLOv5 detection with the improved OSNet re-identification,uses the improved YOLOv5 to obtain the detection frame of the video frame,and then uses the Kalman filter and the improved OSNet to perform motion modeling and appearance of the detection frame.Feature extraction,and use the cascade matching method to correlate the detection frames of the front and rear frames to realize a high-precision real-time pedestrian tracking system. |