Font Size: a A A

Research On Image Retrieval Method Based On Feature Fusion Graph Attention Network

Posted on:2024-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:J L WangFull Text:PDF
GTID:2568306944455784Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Image Retrieval aims to retrieve and return the most similar images to a query image in a database.With the development of the big data era,the amount of image data has increased exponentially.However,due to hardware limitations,image retrieval models can only use a small amount of image data during each iteration,which prevents them from constructing sufficient positive and negative sample pairs.Additionally,these images are limited in dimensionality and do not contain depth information about objects,severely impacting their performance.In response to the problem of insufficient positive and negative samples during model training,inspired by the Graph Attention Network(GAT),this paper proposes a Learnable Descriptor Graph Attention Network(LDGA-Net)for image retrieval.LDGA-Net can fully utilize the relationships between adjacent data and effectively improve the model’s ability to mine hard negative samples by aggregating adjacent samples through multiple layers of learnable graph attention layers(LDGAL)and multilayer perceptions to obtain new feature vectors,thereby constructing more positive and negative sample pairs while speeding up model convergence.LDGA-Net is only used during training and does not increase the computational cost of image retrieval models during testing and deployment.It can also be easily applied to various models.Experimental results show that LDGA-Net can help improve the performance of image retrieval models and reduce training iterations by mining more effective samples.Building on LDGA-Net,this paper proposes a Feature Fusion Learnable Descriptor Graph Attention Network(FFLDGA-Net)for image retrieval to address the problem of missing depth information in image data.FFLDGA-Net consists of multiple LDGA-Nets,multiscale dilated convolution modules,and a one-dimensional path aggregation network.LDGA-Net constructs new feature vectors through a relationship matrix to effectively improve the model’s ability to mine hard negative samples.FFLDGA-Net only adds LDGA-Net during training and removes all LDGA-Nets during testing or deployment to avoid additional computational burden.The multiscale dilated convolution module can fully fuse image feature vectors with point cloud feature vectors,use point cloud data to compensate for the missing depth information in image data,improve the interpretability of the fused features,and increase the model’s computational speed.The one-dimensional path aggregation network can establish correlations between highdimensional and low-dimensional features to help the model better fuse feature information from different receptive fields.In addition,FFLDGA-Net uses a soft label strategy to measure the relationships more accurately between sample data,reducing noise during training on autonomous driving datasets.Experimental results show that compared to the LDGA-Net model,the proposed FFLDGA-Net model can effectively fuse image and point cloud data,compensate for the missing depth information in image data,and further improve the accuracy and robustness of image retrieval.
Keywords/Search Tags:Image retrieval, Graph attention network, Feature fusion, Dilated convolution, Path aggregation network
PDF Full Text Request
Related items