Font Size: a A A

The Design And Realization Of The Query Engine

Posted on:2013-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:C C KanFull Text:PDF
GTID:2248330374985176Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
At present, in the sector of industrial especially in the Internet, a large number ofdata is produced in the daily operations, and the demand of storaging, processing andanalyzing these data is rapidly in growth. The capability of a single computer can nothold these far exceeding data, and how to manage and analyz the amounts of data hasbecome a hot research. Distributed and data analysis based on heterogeneous storageoffer us a good idea to solve the problem.Based on Baidu’s specific application requirements, for the problems exposed bythe existing system in Baidu, by researching and analyzing the similar system of theworld, we design the Query Engine based on heterogeneous storage. The thesisstudies Ad hoc queries of the data in HDFS, and on the basis, we design the QueryEngine based on heterogeneous storage. As an intermediate layer to providing services,Query Engine shields the user from the heterogeneous storage.The main research contents are as follows:(1) I designe the clinet and finally implement it. Query Master is designed by meand others, finally we implement it together. Clinet is the interface which is able tointeract with the Query Engine. Clinet receives the input of command line by user, andparses the options of the command line. Clinet requests a Query Master which providesservice for the Clinet from Query Server, and submit the command which is parsed tothe Query Master. The Query Server is responsible for maintaining all the informationof Query Master in the system.(2) We achieve efficient query based on heterogeneous storage by Query Engine.It is a layer for providing sevices based on heterogeneous storage. Query Mastercompiles SQL, and generates a physical plan for executing through the Meta data onMeta Server. MapReduce Nodes execute distributed calculation based on the dependence of the physical plan. Finally, the result produced by the MapReduce Nodes ismerged and returned. The layer shields the user from the heterogeneous storage.(3) We design the architecture of Query Engine. Through deeply studying thetheory of distributed computing and distributed storage, researching of the architecture of existing systems in the world, we design the architecture of Query Engine. Weanalyze the function of each sub-module, giving the framework and process diagram ofsub-module and realize these sub-modules one by one. By designing the function ofbuilding databases, building tables, Query Engine manage the massive data perfectly.Through providing query on heterogeneous storage, Query Engine provides aconvenient way for data analyzing.
Keywords/Search Tags:Ad hoc query, heterogeneous storage, load balance, distribute
PDF Full Text Request
Related items