Font Size: a A A

Research And Implementation Of Clustering Method Based On Feature Extraction

Posted on:2019-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZuFull Text:PDF
GTID:2370330548982859Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
The development of bioinformatics is the result of cross-disciplinary research.It has been studied in many fields and has multiple identities.This diversity enables bioinformatics and bioinformatics work to contribute to more life science research.With the rapid growth of bioinformatics databases,how to effectively organize and analyze these massive genetic data and extract effective medical and biological information has become a hot topic for scholars.Gene feature extraction is an important technique for analyzing data and processing data.It has a wide range of roles in bioinformatics,such as studying the common functions of genes.At present,there are many methods for extracting gene features.This article has existing feature extraction methods.Above,new knowledge is added,and the superiority of the method is illustrated by comparison of experimental results.The main work of this paper is as follows:(1)Based on the characteristics of gene sequence classification,combining with fuzzy clustering analysis method,based on the original Markov chain model gene clustering method,the interaction value of nucleic acid base pairs is introduced to obtain a distance matrix with dual properties.According to the fuzzy cluster analysis method,the fuzzy similarity matrix and the dynamic clustering map are obtained,so as to realize the classification of gene sequences.Through the fuzzy clustering of 16 p53 gene sequences including 16 species of humans,the clustering results show that the species relationships are more similar and easier to cluster into one category.In addition,a comparison of the two-matrix matrix method with the original single-property method for clustering results is performed to find that the method with dual properties is more accurate.(2)The method of extracting eigenvectors uses the transition probabilities of base pairs in the Markov chain model,ignoring the positional information of base pairs.Based on this,48-dimensional eigenvectors are constructed by calculating the eigenvectors of the number,position,and regularity of the base pairs of gene sequences;and the number,position,and variation regularity eigenvectors of individual bases in the sequence are calculated to form a12-dimensional matrix.Feature vector method.Using the above two methods,the p53 gene,mammalian mitochondria,and avian influenza virus(H7N9)data sets were clustered and analyzed.The experimental results show that the results of clustering using the48-dimensional feature vector method can more accurately reflect the nature of bioinformatics feature.(3)In order to avoid large calculations,this dissertation introduces the average energy(EIIP)method of nucleotide free electrons with certain physical properties,maps DNA sequences into digital signals,and uses power spectra for the three-period of gene sequences.After analysis,secondly,the discrete Fourier transform method was used to obtain the feature spectrum of the gene sequence,and a 12-dimensional feature vector method was constructed.This method not only includes the position information of the bases in the gene sequence,but also simplifies the calculation.The p53 family gene datas are selected for hierarchicalclustering and compared with the VOSS mapping power spectrum DNA sequence analysis method.The results show that this method is more accurate.excellent.
Keywords/Search Tags:Gene Feature Extraction, Cluster Analysis, Markov model, Base Pair, EIIP, Power Spectrum
PDF Full Text Request
Related items