| The image features extracted by deep learning image retrieval algorithms are richer and have good distinguishability and robustness.However,when there are small-scale targets or occluded targets in the image,these mainstream retrieval algorithm models cannot achieve Ideal results,and the retrieval accuracy is low.We analyzed the main reasons for these two problems and proposed two new retrieval algorithm models based on the existing mainstream models.Firstly,to solve the problem of low retrieval accuracy caused by small-scale targets,a deep local feature aggregation retrieval model based on attention(DLAA)is proposed.In this paper,the feature map output from the convolutional layer in the Residual Network(Res Net)is used as the deep local feature,and the output of the fully connected layer is used as the global feature of the image,and then the attention mechanism is used to select the key points of the deep local feature.Moreover,a new feature aggregation technology is designed,the purpose is to make more distinguishable features have higher weight in the image representation,and improve the performance of the model.The experimental results on the three data sets of Google Landmarks,Oxford and Holidays show that DLAA can effectively improve the retrieval accuracy for small-scale targets.Secondly,in view of the unsatisfactory retrieval effect caused by the occlusion of the target in the image,a retrieval model of multi-scale deep local feature fusion(MFFM)is proposed.In this paper,based on the original Res Net structure,the residual block is optimized and upgraded to extract multi-scale deep local features.Then combined with the hybrid attention network,the feature pyramid network of two links(Feature Pyramid Network,FPN)is used to complete multi-scale deep local feature fusion,making the semantics of the fused features richer,and the semantic information of the occluded target will not be excessively suppressed.Experimental results show that the multi-scale deep local feature fusion algorithm model can improve the retrieval accuracy in complex environments such as target occlusion. |