| Both scene recognition technology and landmark recognition technology have achieved good recognition effect in their own application scenarios.However,it is found from actual business requirements that scene data and landmark data usually appear at the same time.If two models are used to complete the scene recognition task and landmark recognition task respectively,the visual task will become tedious and easy to cause resource waste.If only a classification model is used to train the scene and landmark data,the trained model will detect many false images in the actual testing process,and the testing effect is not ideal.In this thesis,a joint training model based on supervised contrast learning is proposed to train two types of data sets,scene data set and landmark data set,and achieve good recognition effect.Firstly,the data set required by the experiment is constructed in this thesis,which consists of partial data in Place365 and self-made datasets.The datasets included 20 scene categories and 5 landmark categories,with a total of 617,835 images.In order to improve the robustness of the model and make the model generalize better,three aspects of data enhancement including color enhancement,blur and horizontal flip are carried out during the model training.Secondly,this thesis constructs a joint training model based on supervised contrast.In the process of training,the scene data set is trained by image classification,and the landmark data set is trained by image retrieval.In addition,the joint training model uses supervised contrast learning assisted training to construct the Loss function of joint training by weighted summation of cross entropy Loss and incremental Angular Margin Loss(Additive Angular Margin Loss,Arc Face Loss),and uses the joint training Loss function to optimize the model.In order to improve the recognition effect of the model,the intra-class differences are reduced and the inter-class differences are enlarged.In the process of reasoning,two modes are divided into classification mode and retrieval mode to deduce the query image and get the recognition result of the model.Finally,a self-comparison experiment and a cross-comparison experiment were designed to test the classification model,the classification model with negative samples and the joint training model on 61783 labeled image data sets and 300,000 unlabeled flow image data sets respectively.Based on the test on labeled data set,the overall Accuracy value,Precision value,Recall value and F1 score of the combined training model assisted by supervised contrast learning in classification mode were0.972,0.973,0.971 and 0.972 respectively.In the retrieval mode,the overall Accuracy value of the model was 0.974,Precious value was 0.974,Recall value was 0.974,and F1 score was 0.974.According to the test results on the unlabeled traffic datasets,in the classification mode,when the threshold is set to 0.7,the average precision of 20 scene categories is 92.02%,and the false alarm rate is 0.018%.The average precision of the five landmark categories was 93.78%,and the false alarm rate was 0.034%.In the retrieval mode,when the threshold value is set to 0.7,the average precision of 20 scene categories is 93.39% and the false alarm rate is 0.017%,and the scores of misdetected images are mostly decrease.The model recognition precision can also be improved by setting the threshold value.The average precision of the five landmark categories was 94.92%,and the false alarm rate was 0.012%.Compared with the classification model,the joint training model has a great improvement in the recognition of both labeled test sets and unlabeled flow test sets.It is found that the joint training model with the aid of supervised contrast learning performs better than the existing model on both unlabeled and labeled datasets.The final verification results show that adding the negative sample category improves the model recognition precision significantly.The joint training model based on supervised contrast learning has a better recognition effect. |