| Target recognition and tracking is one of the important research topics in the field of com-puter vision.Its main purpose is to identify the target from the image or video,and accurately locate the position of the target in each video frame,so as to realize the continuous identifica-tion and stable tracking of the specific target in the actual scene.Although target recognition and tracking technology has been widely used in public security,criminal investigation,UAV monitoring,robot navigation,intelligent transportation video monitoring,video retrieval and many other fields.Due to the interaction of many factors,such as illumination variation,scale variation,appearance variation,occlusions,motion blur,out-of-plane rotation and in-plane ro-tation,etc,the performance of existing target recognition and tracking algorithms in complex scenes is not satisfactory.In complex scenes,there are typically three factors that often lead to the deterioration or even failure of target recognition and tracking algorithms:Firstly,the low resolution of image and video reduces the distance between different features;Secondly,the change of shooting angle,lens parameters and illumination conditions make the same object show obvious appearance change under different cameras;Thirdly,frequent occlusions make the features of specific objects unstable or even lost,which leads to the phenomenon that the intra class difference is greater than that of the inter class.Therefore,the robust and efficient target recognition and tracking algorithm with strong anti-jamming ability is still a challenging research problem in the field of computer vision.In this case,this paper focuses on the methods of target recognition and tracking in complex scenes,and tries to build a more robust algorithm of target recognition and video tracking by introducing sparse constraints,attention mechanism,feature channels weighted discriminant correlation filter.Thus,it can enhance its robustness to low resolution,appearance changes and occlusions in complex scenes.The main works are listed as follows:(1)Consistent Sparse Representation for Video-Based Face Recognition.Firstly,a pri-ori hypothesis of label consistency is proposed,that the dictionaries corresponding to the same class of samples should be combined.In this paper,it is considered that the dictionary of data set should have the characteristics of block structure,and because the images in prob set are related to each other,the whole reconstruction can be realized by using the least number of blocks in the dictionary for linear representation.Secondly,considering that lF,0 mixed norm represents the number of non-zero matrix blocks in the reconstructed coefficient matrix,lF,0 mixed norm is used to constrain the coefficient matrix,and then obtains a more sparse linear expression coefficient with obvious block structure distribution than l2,1 mixed norm.Finally,a CSR-l1 model is pro-posed by combining CSR model with the l1 norm.Because atom-level sparsity and group-level sparsity are combined in CSR-l1,it improved the accuracy of recognition.Experimental results show that the proposed method is more competitive than those state-of-the-art video based face recognition methods.(2)Attention-Aware Adversarial Network for Person Re-Identification.Firstly,a nov-el image data augmentation method on the feature map level is proposed.By occluding different regions of the feature map in turn,the diversity of features is enforced.Secondly,in order to deal with the occlusion and other challenges in the recognition process,an attention assignment mechanism and an attention-aware adversarial network are proposed.In this network,the oc-cluded feature map with the lowest classification accuracy is selected as the difficult sample,which is input into the adversarial loss function together with the initial attention map to train the network and generate a more precise attention map,so as to allocate attention to multiple im-portant areas of the target object.Finally,the attention map is integrated with the representative feature maps of the person image by an element-wise multiplication operation.The generated feature maps are known as attention-aware feature maps.Together with the original represen-tative feature maps,the occluded attention-aware feature maps are entered into the subsequent classification network to train attention-aware adversarial network.Experimental results show that the proposed method performs favorably against the state-of-the-art methods.(3)Consistent Sparse Representation and Weighted Discriminant Correlation Filter Based Tracking Methods.Firstly,each-candidate particle obtained by particle filter is dis-turbed to generate a small video stream.At the same time,the features of non target objects are added to the base of observation target subspace,so that the problem of target tracking can be transformed into a binary classification problem based on video stream.And then consistent sparse representation is used to classify which is effective to improve the robustness of target tracking.Secondly,To address the problem of giving the same confidence to the feature channel in C-COT,the Average Peak Correlation Energy(APCE)is used to evaluate the corresponding response map of each feature channel,guiding the target appearance model to give differen-t weights to different features.Then,we can obtain the final weighted feature response map whose peak value is applied to locate the target.Finally,in order to avoid similar background interference and over fitting caused by updating model in every frame in C-COT,the method of Peak Side Lobe Ratio(PSLR)is adopted to update the model.The experimental results show that compared with the C-COT algorithm and nine other typical algorithms,the performance of the proposed method is significantly improved. |