| In recent years,with the development of deep learning technology,algorithms of convolutional neural networks make great progress in various visual tasks by relying on the rich data of the Internet and powerful feature extraction capabilities.Multi-object segmentation and tracking is a novel machine vision task integrating detection,tracking and segmentation.It requires instance level segmentation of objects in video frames and track association between frames,which has broad application value in various practical scenes.Based on deep convolutional neural network technology,this paper makes exploratory and innovative research on multi-object tracking and segmentation algorithm based on spatio-temporal feature fusion,and achieves the following results.In this paper,a multi-object tracking network framework based on association feature enhancement is proposed to enhance the expression of association feature from time and space.First,the network extraction algorithm through the trunk to the polymerization of the features of different levels of information.Then,this paper designs the features of separation module,which is used to enhance the expression of correlation feature,from the spatial dimension by designing different attention mechanism.The module not only distinguishes the features of the detection branch from those of the appearance expression branch,but also can solve the two branches for network optimization direction of contradictions.Next,this paper designs a longterm appearance memory module to strengthen the association features from the time dimension.By inputting the appearance information of multi-frame tracking track into the long and short term memory module,the network can realize the fusion of multi-frame appearance features,so as to solve the problem of frequent occlusion and huge deformation in multi-object tracking.The two modules enhance the correlation feature from spatial dimension and time dimension respectively to ensure the connection result of detection result and tracking track in multi-object tracking.The proposed method achieved superior performance on both public multi-object tracking benchmark datasets,and improved by 0.4 percentage points compared with the corresponding method on the difficult MOT17 dataset.Based on the multi-object tracking algorithm implemented in the previous work,this paper further explores the fine granularity of tracking,studies the multi-object tracking and segmentation algorithm.This paper proposes a multi-object tracking and segmentation algorithm based on the fusion of spatio-temporal features.Algorithm for network made up of2 D encoder and 3D decoder.Firstly,multiple consecutive frames of images are input into 2D coding layer to extract image features with different resolutions,and then starting from the features of the low resolution through space three-dimensional attention modules are important spatial features,through the compression of time since the focus module contains keyframe time features of information.Then,the two features are fused with the original features,and then are input into the 3D convolution layer together with the features with higher resolution.The features of different levels are repeatedly aggregated to obtain the features with both critical temporal information and important spatial information that are fused for several times.Finally,the tracking and segmentation results are obtained.The algorithm implements end-to-end training,and the indexes of the two public multi-object tracking and segmentation benchmark datasets are significantly better than those of the contemporanity method,with an improvement of 0.9 percentage points in the difficult KITTI MOTS datasets. |