Font Size: a A A

Key Technology Of QAR Data Organization And Analysis Based On Hadoop

Posted on:2017-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhaoFull Text:PDF
GTID:2322330503488058Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
QAR(Access Recorder Quick) is a widely used aircraft logging device, which records valuable information. At present, CSV(Comma Separated Variables) type file and relational database are two main forms of QAR data storage type. With the rapid development of civil aviation, the scale of QAR data is increasing rapidly, the existing storage type is not enough to support massive data storage, and the algorithms, computing speed, internal and external storage capacity are facing severe challenges. Therefore, it is important to construct a new storage and analysis structure in order to meet the needs of aviation enterprises in mass QAR data storage and query analysis.This paper presents a Hive-based QAR data warehouse and a method for fast query and analysis in order to solve the problem of the existing data warehouse. Based on the analysis of the characteristics of Hive and the data structure of QAR, the overall architecture and storage structure of QAR data warehouse based on Hive is designed. By porting the data in the existing data warehouse to the QAR data warehouse based on Hive, Hive-based QAR data warehouse can be compatible with the existing data warehouse.In order to obtain the valuable information hidden in the QAR data, we need to use a range of data mining methods. Frequent pattern mining is a very effective method to acquire knowledge from the data, but the traditional algorithm can't deal with the massive data stored in Hive-based QAR data warehouse. In order to make full use of the large scale data stored in Hive-based QAR data warehouse, the traditional algorithm needs to be run on a distributed platform. H-mine is an efficient algorithm for frequent pattern mining. Based on the in-depth analysis of H-mine, this paper proposes a novel MapReduce-based H-mine algorithm(we call our parallel algorithm MRH-mine). MRH-mine achieve the transformation of H-mine in the distributed operation environment, experimental results show that Hive-based QAR data warehouse and MRH-mine have a good performance and scalability in face of massive data growth.
Keywords/Search Tags:Hadoop, QAR, Data warehouse, Hive, H-mine
PDF Full Text Request
Related items