Research On Identification Of Search Requirements Based On Map-Reduce And Tire Tree

Posted on:2016-02-02

Degree:Master

Type:Thesis

Country:China

Candidate:Y Xiao

Full Text:PDF

GTID:2308330470462155

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

In the era of internet, the amount of data grows in explosive way. People face with opportunities and challenges at the same time. Now people are constantly digging out useful information from the big data mine, on the other hand, they may face a lot of redundant information helplessly. In this situation, as a tool helping people search for information from the mass of web pages, the search engine, is under the pressure from data growing.However, with the growth of amount of data search engines tend to make the index bigger and the search task harder. In fact, the vast majority of the information crawled by search engines is irrelevant to the user needs. Making effort to let the search engines analyze search userâ€™s search needs, will be able to provide users with better search service experience, and save unnecessary search calculations. So the search engine usersâ€™ search needs have got the attention of domestic and foreign scholars. To complete the prediction of search requirements, we must identify the userâ€™s search term, such identification usually require some means of web log mining.But now the search log data is in TB levels, it is hard to analyze all the data on a single computing node.The thinking of this paper is to compare the search words and historical log,searching the training of history log to get the pattern to identify the usersâ€™ searching needs. But now because the search log data is in TB levels, the training is difficult to achieve on a single computer.According to the characteristics of big data analyze, this paper presents a distributed parallel program called Paratemp. With the Map-Reduce technology in distributed cluster we excavated representative classification templates. Using association rules we learn confidence and support, to study the selection criteria for the template. The template which is selected can be used as basis for classification of search needs.After the extraction of the search template, we need an efficient natural language algorithm for the matching of search terms and new templates. This paper designs Tempaser recognition algorithm, using the Trie tree thinking, consume more space to accelerate the computing, and recognize the search template. The final experiment proves correctness and efficiency of Paratemp programs and Temparser algorithm.Finally, we summarize research result and analyze the future study.

Keywords/Search Tags:

Map-Reduce, Trie Tree, Search needs, Search Template

PDF Full Text Request

Related items

1	Research Of Personalized Search Based On Trie Tree
2	The Design And Implementation Of Site Search Engine Based On The Inverted Index And The Trie
3	The Design And Implementation Of Intelligent Search By T9Keyboard On Mobile Terminal
4	Research And Implementation Of Several Key Technologies In Intelligent Chinese Search Engine
5	Research On The Algorithms For String Similarity Search
6	The Research And Implementation Of A P2p Search Technology
7	The Design And Implementation Of Video-info Extraction System In Video Search Engine
8	Research On Keyword Query Approach Over RDF Data Based On Tree Template
9	The Study Of The Framework Of Distributed Intelligent Search Engine Based On Map/Reduce
10	Research On The Evolutionary Search Algorithms In The WEB Based On Learning