Top-k Join Query Processing Method Based On MapReduce

Posted on:2017-02-23

Degree:Master

Type:Thesis

Country:China

Candidate:S P Liu

Full Text:PDF

GTID:2308330482499726

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet and the arrival of the big data era, a mass of data has been generated from the Internet. Nowadays, the top-k join query has been widely used in many territories such as E-business and Internet due to its super performance like forecast the business, understand customersâ€™need, evaluate the goods and so on. MapReduce, the distributed processing framework, is widely used in data processing with its reliability, scalability, efficiency and fault tolerance. This paper is about processing top-k join query in the MapReduce environment. Managing large data can learn and get many valuable information quickly.First of all, based on mass data top-k join query, the author puts forward the top-k join query method based on MapReduce. The author uses random algorithm to balance the partition in the Map phase so that all the data handled by the Reduce can be similar or identical. The data can not tilt and the time can be relative mean. Then, the author creates a new table by combining join key and two indexes in the Reduce phase, ranks according to the join key and scans in proper order to execute preliminary links and update the index segmentation information table in real time. By scanning index segmentation information table, the author can make sure the threshold value, find the connect indexes which contain k top scores, read tuples from the two tables and connect them. The author doesnâ€™t connect all the tuples when he calculate the top-k join. He filters many tuples by threshold value, connects the tuples that may be the final results and save a lot of time.Second, different users have different preferences in the query, the author presents the top-k join query processing methods based on preference. According to usersâ€™ definitions of preferences, the author recognizes that the skyline technology can handle usersâ€™preferences quite well.First, the author uses pretreatment to connect two tables, handles usersâ€™preference by skyline, then filters tuples that can not meet the need of usersâ€™preference. At last he finds the needed top-k join results by scoring function.At last, in order to handle usersâ€™preferences quite well, the author uses the skyline technology to deal with usersâ€™ preferences. So in the processing algorithm of preference, this paper puts forward the algorithm based on skyline usersâ€™ preference. In this paper, the author first extracts usersâ€™ preference dimensionality from the join results, and segments the data space. By determining the dominance relationship between these blocks, the tuples in the dominated data blocks will be filtered. Then, he uses skyline algorithm to filter the dominated tuples in every blocks of the rest data and figures out the virtual minimum points of every blocks. The author compares the data in the blocks and virtual minimum points of the rest blocks, decides whether it should be compared with data in the block. He switches the comparison between tuples to blocks and blocks, tuples and blocks and filters the dominated blocks and tuples during the comparison. In this way, the author can decrease the data scale, save the execution time and improve the operation efficiency of system.Furthermore, the author experiences a lot to verify the feasibility and expansibility of his methods mentioned in this paper. By analyzing the experimental results, we can see that the top-k join query method based on the MapReduce can handle the top-k join query in the large-scale data quite well. The top-k join query method based on preference can meet usersâ€™preference and solve some practical issues.

Keywords/Search Tags:

MapReduce, Top-k join, Preference, Skyline

PDF Full Text Request

Related items

1	Research Of Query Processing Method On Top-k Skyline In Mapreduce
2	Research On Skyline Query Processing Techniques
3	Research On Skyline Query Based On MapReduce
4	Research Of Dynamic Skyline Query Processing Approach In MapReduce
5	Research On Improvement Of Similarity Join In MapReduce
6	Research And Optimization Of Join Algorithm Based On MapReduce
7	Optimum Design Of Table Join Algorithm Based On MapReduce
8	Research And Design Of KNN-join Algorithm Based On MapReduce
9	Join Method Research Based On MapReduce
10	Design And Optimize Big-Data Join Algorithms Using MapReduce