Font Size: a A A

The Application And Research Of Improved Clustering Algorithm In Tibetan Medical Diagnosis And Treatment Data

Posted on:2020-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:C Y LiuFull Text:PDF
GTID:2404330596984451Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The main produce of clustering algorithm in data mining is applying a method to divide the data into some groups and distinguish the data which has the similar characteristics.The clustering algorithms,which are used in many fields such as commerce,agriculture,network,medicine,play an important role in data mining.So,there are more and more clustering algorithms appeared and cluster analysis has become a hot research field.This paper applied the improved clustering algorithm to the data of chronic atrophic gastritis(CAG)in Tibetan Medicine,then classified and analyzed syndrome types based on the clustering results.Choosing three clustering algorithms which were commonly used in syndrome classification in medicine and applying them to clinical diagnosis and treatment data of CAG was the first step.According to the experiment comparison and evaluation function,k-means clustering algorithm was the best to be chosen to improve.Then,this algorithm was improved by using cosine similarity algorithm in vector space model.Based on the sum of cosine values of the data in a class and the cluster centers of their respective clusters,the improved algorithm was more effective.Secondly,in the light of the clustering initialization method,the proportion selection method was used in choosing initial clustering centers.Combining the k-means clustering algorithm which was improved in the first step,the accurate clustering results were obtained on the basis of the sum of cosine values.According to the simulation experiments,the improved k-means clustering algorithm which combined proportion selection method and cosine similarity had highest accuracy and effectiveness.Therefore,the clustering results were the best.In the end,this paper used original k-means clustering algorithm and improved k-means clustering algorithm to analyze the data of CAG respectively.The experiment consequences showed there existed some differences between the initial cluster centers and final cluster centers.According to the sum of cosine values and iterations,it could be clearly seen that the original k-means clustering algorithm was unsatisfactory.Through the comparative analysis of experiments,it could be concluded that the k-means algorithm which was improved based on the cosine similarity and proportion was the best.Also,the similarity was high between data and center in one cluster.On the basis of the experiment consequences,the syndrome types were obtained and the characteristics of each syndrome type were analyzed.
Keywords/Search Tags:Data mining, Cluster analysis, Improved k-means algorithm, Proportion selection method, Cosine similarity
PDF Full Text Request
Related items