Font Size: a A A

Research And Application Of Full Text Retrieval Based On Hadoop

Posted on:2018-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:P WangFull Text:PDF
GTID:2348330569486435Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
After 1990 s,the rapid development of computer technology has brought great changes to people’s lives.With the rapid development of information technology,computers have been widely used in all walks of life.The popularity of computers has also brought the explosive growth of data,the data format is not limited to ordinary text,but also pictures,video and other multimedia data.Most of these data are meaningless,so how to retrieve the data from the vast amount of data ? This promotes the development of distributed computing and full-text retrieval technology.The most popular framework of distributed computing is hadoop.Full-text retrieval service is geared to the needs of unstructured data,such as text,video and picture.Because of the requirement of internship customer,which needs to achieve a search function with image;this article studies the image retrieval based on hadoop.First of all,Deeping study about the idea of distributed computing and Hadoop computing framework,and then introduced the full text search,including the basic concepts,the core of the search process.In addition,a full-text search toolkit Lucene was introduced to lay the foundation for the following study.Then,this paper makes a deep analysis of the internal workflow of Hadoop MapReduce parallel computing framework.Through the transform of the operation,and the timing of the task flow to analyze the optimization of the operation process;then it introduces several optimization methods of the existing scheduling algorithms,and puts forward their own optimization scheme,which can reduce the heartbeat cycle by merging the job set up task,in order to shorten the operation time and improve efficiency.Finally,an image retrieval system is implemented,this paper adjusts the traditional image retrieval and puts forward a frame of image search interface based on Web,and uses the optimized MapReduce computing framework to construct the index of all images.Because only the buttons are retrieved,the precision of retrieval is high.The Lire(Lucene image retrieval)feature extraction value provided by the algorithm of Tamura(texture)and CEDD(color)together,as the common influencing factors of image similarity,puts forward a comprehensive similarity calculation formula,thecalculation formula and the promotion of a comprehensive k similarity degree.Then it introduces the core steps to realize the system,and achieves the ideal effect.
Keywords/Search Tags:distributed, full text retrieval, MapReduce, similarity, image retrieval
PDF Full Text Request
Related items