Font Size: a A A

Analysis And Prediction Of Big Data Of Chinese Medicinal Materials Based On R+ Hadoop

Posted on:2017-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:S S WangFull Text:PDF
GTID:2284330509953337Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The data collected and stored by Data Centers of Gansu Hui Sen Pharmaceuticals is nearly 70 GB. With the deepening of its business, the data is showing explosive growth. Facing huge data pressure, although Hadoop parallel framework boasts a unique advantage to process the massive data of Chinese medicinal materials, it lacks the capabilities of data modeling and data visualization. Therefore, combining the advantage of Hadoop and R language, according to the market data characteristics of Chinese medicinal materials, and focusing on the difficulty to master the Chinese medicinal materials market variety demand the uncertainty price fluctuation and other current situation, this paper designed and completed analysis and prediction of large medical data on the basis of R language and Hadoop, to achieve reliable processing for the big data of Chinese medicinal materials market, which is very important to speed up the development of Chinese medicinal materials industry in Gansu and to resist the risk of trade market. The main research contents of this paper are as follows:(1) Hadoop cluster, R language and Hive framework development environment had been arranged in this thesis. And the author had put forward the method for the analysis and prediction of big data in Chinese medicinal materials based on R language and Hadoop framework, and studied the data analysis and visualization process based on the environment of R language and Hadoop and Hive.(2) With deepening analysis of the basic principles and internal structure of Hadoop framework, and from perspective of the software written, the thesis intended to improve the computational performance of the programming model through improving the inherent of reading and writing, partition and format for input and output of the Map Reduce programming model.(3) In order to complete the connection of two different formats of data sources of the large Chinese medicinal materials market data and weather data, this thesis proposed a method to preprocess the big data of Chinese medicinal materials on the basis of Hadoop and Hive.(4) In order to realize the reliable prediction of the market price of Chinese medicinal materials, this thesis used the method of exploratory analysis. Firstly, the article used two regression models, that is, multivariate linear model and decision tree model to build a model for Chinese medicinal materials market data. Then, in order to overcome the limitations of single model, this thesis used random forest model tofurther conduct regression analysis on the data. At last, it obtains the best prediction model of the Chinese medicinal materials market price on the basis of verifying the established model through performance comparison and cross validation.(5) In order to verify the reliability and validity of best prediction model, the comparison between the predictive value and the true value of the models had been conducted in this paper.
Keywords/Search Tags:Chinese medicinal materials, Hadoop, R language, Hive, Decision tree, Random Forests
PDF Full Text Request
Related items