Font Size: a A A

Research And Design Of TCM Data Mining System Based On Hadoop

Posted on:2018-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y N ChenFull Text:PDF
GTID:2334330533459891Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Progress and development of modern science and technology,has speed up the digitization of medical information,medical systems include traditional Chinese medical information system is also fast soundand complete.Followed by the sudden increase in the amount of medical records of Chinese medicine,how to use this massive Chinese medicine data to prevent disease,diagnosis of disease,to provide optimization programs,auxiliarydiagnosisand treatment,is the subject which current industry experts highly concerned.The mining of large data sets under the traditional stand-alone platform is often trapped in the dilemma of data storage and lack of computing space,and its computational efficiency can not be improved obviously.In this context,this paper proposes two kinds of TCM data mining scheme based on Hadoop platform to deal with huge Chinese medicine data.The parallel Apriori algorithm is used to mine the mixed data of traditional Chinese medicine,symptom and syndromes.Using parallel designed K-nearest neighbor algorithm to classify a group of an unknown symptoms,predict its ownership syndromes.At the same time,based on Hadoop platform to build a TCM data mining system,the main research content is as follows.Firstly,two kinds of classical data mining algorithms are modified in parallel.In order to solve the problems such as the large number of key-value pairs and the frequent reading and writing of node IO in the parallel calculation process of Apriori algorithm,the optimization scheme is designed,and the intermediate process data is stored by Hbase.Using the Iterative combination of recursive way instead of the traditional self-join mode when generate candidate itemsets,so as to effectively improve the iterative calculation rate,to accelerate the generationof frequent itemsets;With the Parallel K-nearest neighbor algorithm,and the symptom level is quantified and normalized to reduce the influence of the different threshold value on the distance calculation,and then the objective group is classified objectively.Then,in the Hadoop2.0cluster,the Apriori algorithm was used to analyze the data of Chinese medicine asthma.The compatibility rule of the traditional Chinese medicine,the compatibility rule of the drug and the syndrome,the symptoms and the symptoms and syndromes association relationship were obtained.And the KNN classification algorithm is used to predict the attribution of the symptom group.The experimental results show that the mining results are basically matched with the theory,and have some practical guiding significance.Finally,the system uses Webservice technology to realize the C / S architecture scheme with Hadoop cluster as the server and Swing technology constructed the client GUI interface to establish the Chinese medicine data mining system based on Hadoop.The system mainly integrates the three modules of cluster configuration,medical record data management and data mining.Through the testing of the various functional modules of the system,the results show that the system has good interactivity and complete function.
Keywords/Search Tags:TCM data mining parallel, Apriori algorithm Parallel, K-Nearest Neighbor algorithm, Webservice technology
PDF Full Text Request
Related items