| Similarity measurement research plays an important role in many visual tasks(including person re-identification,image clustering and image classification).In order to optimize the performance of similarity measurement,the context similarity measurement method based on diversified constraints and deep neural networks has become a hot research direction.The main work of this thesis is as follows:Firstly,considering the lack of use of contextual information in the similarity measurement of sample features,the re-ranking performance of person re-identification is affected.We use the context similarity measurement module constructed by the graph structure and the attention mechanism to alleviate the re-ranking problem in the person re-identification problem.By integrating the attention mechanism into the graph convolutional network,the graph model is merged onto the feature subset that depends on the initial ranking.On the one hand,the contextual information between sample features is considered to be calculated by using the aggregation operation of the graph convolutional network to calculate their similarities.On the other hand,a channel attention mechanism is adopted to enhance the contribution of related feature channels and further enhance the similarity evaluation ability of the entire network.Experimental research shows that the proposed network structure has significant advantages on three person re-identification datasets with supervised signals.Secondly,in reality,a large amount of image data has no label information,and there are various changes in the target class.We believe that the lack of attention to hard sample pairs in network modeling and one-sided consideration of similarity measurement in the process of merging have exacerbated the difficulty of the task.In order to alleviate these problems,we propose an unsupervised network that gradually optimizes feature expression under the guidance of diversity context similarity.Feature learning is guided by gradually generating high-quality labels,thereby alleviating the performance gap between unsupervised learning and supervised learning.Experiments prove that the proposed model is superior in unsupervised feature generation and image retrieval tasks.Finally,due to the content of the image is complex,and the existence of multiple semantic objects(subjects)in the image may confuse the training direction of the network,which results in the loss of distinguishability of features.We use multiple classic image datasets to synthesize a multi-subject image dataset and propose an intention model based on the diversity context similarity measurement method as the basic performance standards of this problem.Experiments prove that under different supervision signals,the intention information is beneficial to the feature generation of the target subject.And the supervision signal can make the intention more powerful.In addition,the proportion of the target subject in the image will also affect performance. |