Font Size: a A A

Clustering Algorithm Research Based On Gene Expression Spectrum Data

Posted on:2014-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:L R WangFull Text:PDF
GTID:2268330422467160Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Bioinformatics produced with the human genome project is a new comprehensivesubject, which is integrated by computer science, biology, physics, applied mathematics.Gene expression data provides a good deal of genetic information, which deciphers theessence of life, genetic function and genetic characteristics, the mechanism of thephenomenon of life, regulating and controlling of the genetic gene and mutual relations ofgenetic gene, promoting the rapid development of the medicine. In view of the small sample,high dimension and high noise characteristics of gene expression data, we mainly researchhow to deal with hundreds of thousands of gene expression data, gene chips and microarraydata analysis methods are very important technologies to the development of bioinformatics.We must analyze a great number of microarray data through the reasonable data analysistechnology, in order to find out the regulation and control mechanism of gene expressionand reveal the essence of life phenomenon.Clustering analysis is considered as an important branch of data mining,which is amultivariate statistical analysis technology and an effective packet processing way for geneexpression data. Without any given classification standard circumstance, clustering analysisconducts automatic classification according to the characteristics of the research objects, thecharacteristics mainly refers to the similarities and differences of samples, so that the resultmeets the request that the similar objects are possibly divided into same class, whiledissimilar samples are possibly divided into different classes,namely the distance of sameclass is minimum, and the distance between different classes is maximum. Currently,clustering analysis technology is widely applied to different fields, and it is also the mainanalysis method for gene expression data.This paper mainly discusses clustering problems about gene expression data, and themain work schedule is as follows:(1) Firstly, this paper mainly introduces the background knowledge of bioinformatics,microarray data and principles of common clustering analysis algorithms and applications.(2) Secondly, this paper detailedly introduces the basic principle of the traditionalparticle swarm optimization algorithm, and analyzes improved particle swarm algorithmdrawbacks, based on the predecessors studies, to improve the algorithm optimizingperformance and convergence speed, time-invariant weighting factor is introduced into the traditional particle swarm algorithm, namely the combination of weighting factor andcompressibility factor.(3) Thirdly, according to the characteristics of gene expression data and clusteringalgorithm, this paper applies the improved particle swarm optimization to particle-pairoptimization, which is combined to K-means. The main research object is leukemia data sets,butterfly migration data sets and colon cancer data sets, it uses K-means based on improvedparticle-pair optimization to deal with data, according to the simulation result, experimentobtains good clustering result and improves the accuracy, comparing to K-means.(4) Fourthly, this paper introduces the basic principle of genetic algorithm, includingdifferent choices methods to all kinds of genetic factors, and summarizes the advantages anddisadvantages of algorithms, at last, to prove the feasibility of the improved algorithm, weuse the clustering effect comparison of feature selected data and the original data.
Keywords/Search Tags:Bioinformatics, Microarray, Clustering analysis, Genetic Algorithm, ParticleSwarm Optimization
PDF Full Text Request
Related items