Font Size: a A A

The Research Of Cancer Classification Based On DNA Microarray Data

Posted on:2011-07-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:H L YuFull Text:PDF
GTID:1114330332959895Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the near completion of Human Genome Project, life science has entered into the Post-Genome Era. In this era, the research mainly focuses on the functions and dynamics of the whole genome but not individual gene. This has given rise to a demand on the processing capability of a large quantity of biology information. DNA microarray (i.e. gene chip) technology is one of major marks of Post-Genome Era and primary research fields in Bioinformatics. By this technology, the expression level of tens of thousands genes may be detected simultaneously. It has been widely applied to diagnose disease especially for cancer at molecular level, recognize subtypes, make clear the principle of a specific disease and develop new medicines rapidly. However, owing to expensive experimental cost, only a few samples are embedded in microarray dataset which leads to high dimension and small samples. Therefore, how to mining useful information and taking advantage of them to guide cancer classification and subtype recognition have been emphasized in machine learning and pattern recognition. This paper mainly research some related aspects of cancer classification based on microarray data, detailed work are listed as below:(1) Wrapper feature gene selection methods generally hold two drawbacks: slow convergence and local optimum. Therefore, two feature gene selection methods based on swarm intelligence are proposed: feature gene selection method based on ant colony optimization and feature gene selection method based on improved discrete particle swarm optimization. The former implements easily and can acquire an excellent solution rapidly which solve the problem of slow convergence effectively. While the latter may avoid local optimum by adding an easy rule, so that new optimum solutions are constantly found.(2) Generally, selective ensemble classification method has high time complexity. Therefore, an ensemble classification method based on correlation analysis is presented in this paper. It decreases computation complexity by extracting diverse classifiers at training subset level but not classifier level. Meanwhile, the proposed approach may keep classification accuracy and save storage cost, which enhances the method usability.(3) A multiclass microarray data classification approach is developed in this paper. Firstly, one-versus-rest support vector machine is used to classify for testing samples. Then the confidences of the classification results are evaluated and some samples with low confidence are extracted. At last, the extracted samples are estimated by a novel strategy named as class priority estimation method based centroid distance. The proposed method improves recognition rate and meanwhile the computation complexity hasn't obvious increase.(4) Considering small sample size of microarray data, an incremental cancer diagnostic method based on unlabeled samples is proposed in this paper. At first, an initial diagnostic system is trained with a few exsiting labeled samples and it will provide diagnosis for testing samples in clinical, the confidences of diagnostic results will be estimated quantificationally, too. Then the samples are decided whether to be returned to human medical experts for diagnosing with other detection methods or not according to the confidences. At last, the new labeled samples will be added into labeled samples set to update the system. The proposed method simultaneously guarantees diagnostic accuracy and utilization of the system. Meanwhile, it is permitted to improve the performance of itself incrementally. Compared with traditional approaches, the proposed method is more practical in clinical.
Keywords/Search Tags:DNA Microarray, Cancer Classification, Feature Gene Selection, Ensemble Classification, Swarm Intelligence, Multiclass classification
PDF Full Text Request
Related items