Font Size: a A A

Feature Selection Algorithm Based On Network Biomarkers And Its Application In Cancer Detection

Posted on:2021-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:B LiFull Text:PDF
GTID:2404330629452716Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of high-throughput data storage technology and the maturity of various sequencing technologies in the field of biological information,the dimension of biological information data is growing explosively.How to mine effective information from massive data is a hot issue in the field of biological information and data mining.Human gene expression profile data contains important information reflecting the cause of disease,and there are many genes in the human body.If we can mine biomarkers related to related diseases from huge amounts of genes,we can not only promote the related research of such diseases,but also help the clinical better treatment of such diseases.In the gene expression profile data set,the genes related to a disease are often different between normal samples and diseased samples,so after finding these genes,we can use machine learning algorithm to classify them,so as to complete disease detection and prediction.In the field of bioinformatics,genes with similar functions often work together and can be regarded as a whole.And these genes have a numerical correlation in the gene expression profile,which are named as network biomarkers.Based on the theoretical basis of network biomarkers,this paper describes the correlation between genes based on cosine similarity and constructs network biomarkers.At the same time,combining the embedded feature selection algorithm and serialization feature selection strategy,this paper designs a feature selection algorithm with the classification ability of machine learning model as the main evaluation index.This algorithm can greatly reduce the feature dimension of data,select gene subsets from a large number of genes,and effectively detect diseases.At the same time,this paper compares the differences between the network biomarkers and the single factor biomarkers in classification,and finds that the feature selection algorithm based on the network biomarkers uses fewer features,and achieves the similar classification performance with the single molecular biomarkers,which has certain significance for precision medicine.In addition,this paper also compared with other feature selection algorithms,and found that the feature selection algorithm based on network biomarkers can more effectively classify and predict cancer data in most scenarios.
Keywords/Search Tags:Gene expression profile, Network biomarker, Machine learning, Feature selection
PDF Full Text Request
Related items