Font Size: a A A

Improved Clustering Analysis And Its Application In DNA Sequences

Posted on:2020-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ChenFull Text:PDF
GTID:2393330599962858Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Clustering analysis is an interdisciplinary and cross-domain data analysis method in data mining.Nowadays the development of society witnesses an increasingly extensive and frequent application of clustering analysis algorithm.Therefore,the improved clustering analysis algorithm is under continuous development accordingly.Because many problems occurred and confused people due to unclear classification boundary,the application of fuzzy clustering is gradually widespread.Many experts and scholars at home and abroad have gradually transformed clustering analysis into graph partitioning,and spectral clustering based on graph is becoming popular.Fuzzy clustering and pedigree clustering have become research hotspots,but for some problems,the two clustering algorithms are not universal,and there are some shortcomings.In order to further optimize the clustering algorithm,we can use some other methods to combine the two to achieve the purpose of optimization algorithm and improve the performance of clustering analysis.As a big agricultural country,maize has always been one of the main crops in China.However,with the increase of maize demand,the increase of maize yield has not been brought.The main factor is the pests and diseases of maize,and the corn borer is the main pest of maize.At present,there are three methods to control corn borer in China,chemical control,biological control and agricultural control,but they are not targeted and the effect is not obvious.In order to further study the growth and development of different kinds of corn borer and to achieve better control effect,this paper puts forward the conjecture that there are category differences in host and geographical location of corn borer,and then studies the conjecture by using genetic diversity analysis and improved cluster analysis based on the known geographical location and genetic data of the host.The results of clustering analysis are tested by SVM classification.In the improvement of clustering analysis,this paper proposes an improved clustering analysis method which combines molecular connectivity index,analytic hierarchy process and Mahalanobis distance method.Firstly,the method of molecular connectivity index is introduced in feature selection,which effectively avoids the ambiguity of clustering results caused by the simple base percentage content as a feature.Secondly,in eigenvalue processing,the analytic hierarchy process(AHP)is used to judge the relative importance difference of different features,and the Mahalanobis distance method is used to construct the fuzzy similarity matrix,which effectively solves the interference caused by the correlation of various factors in traditional clustering methods for clustering,and the importance difference of different features for clustering objectives.The improved clustering analysis is realized by programming with MATLAB software.It is concluded that there are population differences among corn borers based on different geographical locations,but there are no population differences among corn borers based on different hosts.At the same time,compared with the traditional algorithm,the improved clustering analysis algorithm has no correlation interference between variables,and has better clustering effect and higher clustering accuracy.Finally,using SVM classifier,the genetic sequences of corn borer based on different geographical locations are classified and tested.The results show that the conclusion of the improved clustering analysis algorithm that there are population differences in geographical locations of corn borer has high reliability.
Keywords/Search Tags:molecular connectivity index, analytic hierarchy process, Mahalanobis distance, fuzzy clustering, SVM
PDF Full Text Request
Related items