Font Size: a A A

The Research Of Gastric Cancer Feature Genes Selection Based On Gene Expression Data

Posted on:2010-12-01Degree:MasterType:Thesis
Country:ChinaCandidate:P LiFull Text:PDF
GTID:2144360275951208Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Gastric cancer is one of common malignant tumors threatening human health,whose early awareness is of great significance to its diagnosis as well as its treatment . With the development of DNA microarray technology, the research of gastric cancer at molecular level is developing. Given the useful information and knowledge derived from the gene expression data, both the nature of the gastric cancer and the relationship between gastric cancer and genes can be understood better. In addition, it plays a pivotal role in promoting the clinical diagnosis and treatment of the gastric cancer, further study in gastric cancer and digging out the pathogenesis of this disease. The thesis takes the gastric cancer gene expression data collected by Beijing cancer hospital as the experimental data, all of which coming from China. Based on gene expression data, there are three contents in this thesis: classification of gastric cancer samples and normal samples, classification of intestinal gastric cancer and diffused gastric cancer and feature genes selection. Finally, some feature genes are selected. For example: CHRNA4,MEA1 and so on. These selected genes have been clinically proved to be inconsistent with the occurrence of gastric cancer and development of gastric cancer. The main research results obtained are as follows:1. The hybrid method based the Bhattacharyya Distance is proposed. It combines the advantages of filter method and wrapper method. The experiment results show that the hybrid method is effective. The hybrid method can be divided into two steps. Firstly, it will rank the genes using Bhattacharyya Distance. Bhattacharyya Distance is supposed to be a good information measure criterion, considering both diversity of mean and diversity of variance in samples. Sencondly, it will delete redundancy based on SFFS (Sequential Floating Forward Selection). At the end of the experiment, there are selected 7 feature genes which can classify intestinal gastric cancer and diffused gastric cancer well, and 9 feature genes which can classify gastric cancer samples and normal samples well.2. The PLS coefficient method is proposed. PLS (partial least squares) and PCA (principal component analysis) can reduce the number of gastric cancer genes well. But, it cann't explain the feature vector. In order to cover the defect of the PLS and the PCA in the interpretation the PLS coefficient method is proposed. The experiment results show that the mehod is effective and explain the feature vector as well. And 20 feature genes are selected.3. A Top Coring Pairs (TSP) method is successfully applied into the Chinese gastric cancer gene expression data, which proposed in 2004 by Donald Genman. According to the literature retrieval results, there is nobody using it on the gastric cancer data set before. During the experiment, there are 11 gene pairs selected from 21378 genes effectively. In conclusion, the thesis also makes a comparison of the three methods including TSP method, the hybrid method based Bhattacharyya distance and PLS coefficient method.
Keywords/Search Tags:Feature genes selection, Gastric cancer classification, Gene expression data, Bioinformatics
PDF Full Text Request
Related items