Font Size: a A A

The Research And Application Of Storage And Mining Methods For Massive In-Vehicle Information

Posted on:2015-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y XieFull Text:PDF
GTID:2308330464470082Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
The problem of big data has become more and more urgent for a rapid growth of information in today’s society. The tools, such as traditional databaseand some like the business intelligence tools are not applicable in analyzing enterprise data. Originating from the Apache foundation, Hadoop has become one of the best solution for huge amounts of data analysis with the increasing number of users and the further development and improvement. Hadoop applications are gradually expanded to telecommunications, e-commerce, banking and other industries. Hadoop will bring the advantage in storing and analyzing data into full play in logistics industry which exists amount of traffic data. Distribution vehicles has been installed on-board equipment which has GPS tracker and embedded computing module,through which to monitor transportation, to analysis these data for the manufacturers, and to get the information of commercial value at the same time.Using Hadoop platform,one will adapt to the transformation of traditional database to distributed storage, and import the existing data or the traditional relational database into HBase.There is no mature system and an efficient way to import data in a short time after the application of HBase. The avaliable material about HBase introduced by HBase shell commands or some simple API cannot meet the needs of rapid bulk import.Even mentioned effective improvement methods also did not give specific operation and implementation. This paper deals with the method to import huge amounts of data into HBase database. In the basis of the performance of the platform, HBase internal operation mechanism and the function of dozens of configuration parameters, some configuration parameters was modified; The design of rowkey was improved in saving space, improving the efficiency of locating and efficiency of the importing; Region spliting used for load balancing on HRegin Sever realized detailedly and tested through the experiment to obtain the optimum partitioning scheme. The paper also gives the bulk load method and its improvement to import data, with the combination of Map Reduce model.Sqoop’s source code is used to embed shell command into the program to improve its flexibility. Finally, these metheds above are test experimentally and andanalyzed by the results.In order to satisfy the demands of customers,useful information was mined in massiveamounts of GPS data, with three Map Reduce-based methods designed. First a method to count vehicle density in the provided regions is designed,in which two methods are adopted so as to locate the geographical position, including building spatial database based on sqlserver and generate quadtree what has geographic information. Then a method to count the passing vehicles near the provided gas stations is designed, in which custom-writable class implements makes the values sorted in chronological order,preventing the correlation data disordered or being scattered to different reduce tasks before the reduce phase.Then the paper discusses a methed to count vehicles going through the provided path, whose key process is to determine whether a driven route and a given path overlap. Finally,accounting for designs the platform configuration and characteristics of these algorithms,the related parameters during the process of the programs are tuned through the experiment, which improving the efficiency of data analysis.
Keywords/Search Tags:Hadoop, Map Reduce, HBase, sqoop, spatial data, GPS, data processing
PDF Full Text Request
Related items