Font Size: a A A

The Research Of Precipitation Using KNN Classifier Based On Hadoop

Posted on:2014-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y G YanFull Text:PDF
GTID:2250330401970439Subject:Meteorological information technology and security
Abstract/Summary:PDF Full Text Request
With the development of information technology, the world in2011has produced about1.8ZB data and which is growing at a rate of five times per year. In many fields, such as meteorology, the amount of data generated by the satellite and radar can reach to300M-500M in one day. This makes the traditional data processing methods cannot handle such a massive data. With the powerful computing and storage capacity, cloud computing technology has gained rapid development worldwide in the past years. Thus, it is very important to let the data mining algorithms migrate to the cloud computing platform and this has become a new way of massive data processing.Based on the thorough study of the parallel strategy of KNN algorithm and the characteristics of meteorological data, this paper will predict the rainfall by using the rainfall data in East China area of1960-2011through the following three procedures:factor selection and parallelism of the algorithm. Therefore, the main work is as follows:(1) This paper uses the two step clustering analysis technology to divide the precipitation region of the six provinces and one city in East China area. For each region, we carry out the spatiotemporal analysis including the analysis on the change trend of precipitation and mutation detection, in order to outline the characteristics of rainfall in East China and to choose the representative region for the prediction research of rainfall.(2) Considering the massive calculation and poor efficiency of KNN algorithm, this paper has proposed a CVKNN algorithm which is based on the class centre vector. It chooses the most representative sample (Boundary Samples) to build a classification model. We elaborate the basic idea and implementation process of this algorithm and carry out the performance analysis of it. Then we give the parallel implementation of CVKNN combined with the MapReduce programming model.(3) We introduce the application of parallel KNN and CVKNN algorithm in the precipitation prediction. Combined with the precipitation partitioning scheme in chapter3, we select daily precipitation data of seven meteorological stations of the Yangtze River Delta region during the period of1960-2011, then carry out the prediction experiments of precipitation on the Hadoop platform and give a detailed analysis of experimental results.According to the analysis results of cluster precipitation experiment, compared with the traditional KNN, the CVKNN algorithm proposed in this paper has greatly reduced the computing time which is mainly due to the advantage of the parallel Hadoop. Besides the algorithm proposed in this paper also achieves satisfactory results on precipitation forecast accuracy, thus this paper provide a good reference on massive meteorological data processing.
Keywords/Search Tags:Hadoop, KNN, CVKNN, parallelization, precipitation prediction
PDF Full Text Request
Related items