The Design And Implementation Of Graph-based Index For Approximate Nearest Neighbor Search In High Dimensional Space

Posted on:2023-02-19

Degree:Master

Type:Thesis

Country:China

Candidate:X B Fan

Full Text:PDF

GTID:2558307097994899

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the advent of the era of big data,the demand for data analysis in various industries and fields is expanding,and the required data mining technologies are developing rapidly.The retrieval technology for large-scale high-dimensional datasets is an essential and important foundation.In database,recommendation system,machine learning and other computer professional fields,nearest neighbor search supports many important applications.Accurate nearest neighbor search technologies,which perform well in low dimensional spa-ce,has a serious decline in efficiency in high dimensional space.Therefore,researchers turn to the study of approximate nearest nei ghbor search.Among them,graph-based algorithms perform best in query efficiency.Especially for the query with high recall rate,they are several times or even tens of times more efficient than other algorithms.In view of the existing problems of graph-based algorithms,this paper puts forward various improved designs,and we list them as follows.Monotonic search network(MSNET)is the most efficient in current graph-based algorithms.It is dedicated to realize monoton ic search by minimizing the average out-degree of nodes,thus reducing the length of search path to maximize the query efficiency.However,according to our inference,the current design of monotonic search network can not guarantee the realization of mono-tonic search.According to our research,the complexity of high-dimensional space and the unpredictability of query vector are two main factors leading to the failure of monotonic search.Therefore,we propose an improved design of monotonic search network,called(r,p)-MSNET.When the distance between the query vector and its nearest neighbor is less than r,the(r,p)-MSNET guarantees that the probability of monotonic search is not less than p.Due to the high time complexity of establishing a(r,p)-MSNET,we propose an approximate implementation,called Tree-based Search Graph(TBSG).Experiments conducted on several commonly used high-dimensional vector datasets show that TBSG is significantly more efficient in query than other algorithms and has a faster index building speed.In addition,further experiments show that transplanting the neighbor selection strategy of TBSG to other graph-based algorithms can effectively improve their query efficiency.In order to solve the problems of large amount of memory cost by index and slow index construction in current graph-based algorithms,we propose a distributed index design.The distributed index design refers to dividing the original large-scale dataset into multiple sub datasets and assigning them to different hosts and constructing sub indexes respectively.When performing the query,the algorithm sorts the sub indexes according to its distance to the query vector,implements the query only in several of them according to the target recall rate,and finally returns the aggregated result.The distributed index design can significantly reduce the memory pressure of single node and improve the scalability of graph-based algorithm on very large scale datasets.

Keywords/Search Tags:

Information Retrieval, Approximate Nearest Neighbor Search(ANNS), Graph-based Search, Monotonic Search Network(MSNET)

PDF Full Text Request

Related items

1	Research On Approximate Nearest Neighbor Search Methods Based On Approximate K-NN Graph For Image Retrieval
2	Efficient Approximate Nearest Neighbor Search Based On Navigating Spreading-out Graph
3	Research On Approximate Nearest Neighbor Search Based On Query-directed Graph
4	Approximate Nearest Neighbor Search Algorithms And Their Application
5	Approximate Nearest Neighbor Search For High-Dimensional Based On Nearest Neighbor Graph
6	Research On Adaptive Graph-based Nearest Neighbor Search Algorithm
7	Research On Hashing Accelerated Approximate Nearest-Neighbor Search
8	Research On Approximate Nearest Neighbor Search And Maximum Inner Product Search For High-dimensional Dataset
9	Research On Adaptive Nearest Neighbor Search Algorithm Based On IVF-HNSW
10	Binary Hashing And Quantization Methods For Approximate Nearest Neighbor Search