Font Size: a A A

The Research Of Meteorological Data Mining Using Bayesian Classifier Based On Hadoop

Posted on:2013-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2230330371484584Subject:Meteorological information technology and security
Abstract/Summary:PDF Full Text Request
As the modernization of the meteorological service is improving sustainably, how to process and calculate the vast amounts of meteorological data efficiently have been an important issue in the field of data mining in meteorology. Distributed technology has become the foundation to apply data mining technology in meteorology which makes it possible to deal with those data in more efficiently way.Based on analyzing the characteristics and processing of meteorological data, we select Chinese terrestrial climatic data sets of daily records in four stations (Xuzhou, Ganyu, Nanjing, Dongtai) in Jiangsu Province since1951for the study. The major work of this paper can be described as follows:(1) Introduce the related technology of the open source cloud platform Hadoop and focus on the description of the programming model, job process and key technologies of MapReduce. Meanwhile, by using the MapReduce programming ideas, we make the rainfall data classification and statistics experiment. The result shows the data sets we choose can be used for the study for the amount of the absence and missing data of the rainfall data in the data sets is very little.(2) Naive Bayes (NB) classifier is recommended and used in the rainfall data classification. In consideration of the characteristics of meteorological data sets, we use correlation coefficient and PKI discretization method to select and discrete predictors. By training and testing the data sets to get classification accuracy, we analyze the NB classifier’s applying shortage in rainfall data classification by three aspects:the predictors’time continuity, the underflow situation of probability calculations and discretization method.(3) Considering the problems that NB classifier’s shortage in the study of rainfall data classification and its low processing efficiency in handle vast amount of meteorological data, the paper gives an improved based on MapReduce model Naive Bayes classifier (MRNB) which achieves mainly by operate MapReduce ideas on three process:preprocessing, model training and the accuracy assessment.Compared with the NB classifier, the proposed MRNB classifier can make full use of cluster resources, improve the data-mining efficiency of the massive data, and get better accuracy in the classification of massive meteorological data sets which can be identified by the rainfall data classification experiment. The improved classifier has good scalability which also provides a better solution for the future’s classified data mining in massive meteorological data.
Keywords/Search Tags:Data Mining, Na(i|¨)ve Bayes, Hadoop, MapReduce, Rainfall
PDF Full Text Request
Related items