Font Size: a A A

Biomolecular feature selection of colorectal cancer microarray data using GA-SVM hybrid and noise perturbation to address overfitting

Posted on:2010-08-09Degree:M.SType:Thesis
University:State University of New York at BinghamtonCandidate:Mizaku, AldaFull Text:PDF
GTID:2444390002985492Subject:Engineering
Abstract/Summary:PDF Full Text Request
In 2008, there were over 100,000 newly reported cases of colon cancer, and 40,000 cases of rectal cancer in the United States. In order to minimize the number of deaths from these diseases, researchers have been striving to find a set of genes that can accurately characterize the correct prognosis for colorectal cancer. Working with a gene expression microarray dataset of about 55,000 genes, collected from 122 colorectal cancer patients, this research developed technology to identify an optimal set of features through several methods of feature selection. These methods included coarse feature reduction, fine feature selection, and classification using a Genetic Algorithm/Support Vector Machine (GA/SVM) hybrid. However, microarray data with dimensions such as these are feature rich and case poor, which can lead to dangers of overfitting to the data. In order to combat this issue, a noise perturbation scheme was introduced with the assumption that genes that are able to survive in this noise will have a strong relation to colorectal cancer. The feature reduction methods produced chromosomes containing genes with known relation to cancer. However, the perturbation analysis, which was designed to confirm these genes, was deemed inconclusive. This research was successful in developing a feature reduction method that was able to suggest a set of genes with potential ties to colorectal cancer, provoking further investigation into this relationship.
Keywords/Search Tags:Cancer, Feature, Genes, Microarray, Data, Noise, Perturbation
PDF Full Text Request
Related items