Font Size: a A A

The Research On Potential Biomarker Selection Algorithms Based On Network Analysis

Posted on:2022-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y P QiFull Text:PDF
GTID:2480306509484574Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Genomics,proteomics and metabonomics are important parts of systems biology.It is of great significance to identify biomarkers reflecting physiological and pathological changes from omics data for disease diagnosis,early warning and drug target prediction,etc.Omics data usually contain many features and a small number of samples.Identifying the biomarkers that accurately reflect the nature of disease occurrence and development is one of the important contents of omics research.This dissertation uses network analysis technology to identify potential biomarkers from omics data.The main work is as follows:A feature selection algorithm FS-DANI based on feature distinguishing ability and feature influence in the biological network is proposed.Disease related biomolecules(features)often exist in important functional modules in the biological network and play a highly involved role in the modules.FS-DANI evaluates the feature distinguishing ability according to the overlapping area of effective range of the feature in different classes and measures the feature influence in the biological network based on the importance of modules in the feature correlation network and the centrality of the feature in the modules.Finally,the feature importance is obtained by the feature distinguishing ability and feature influence in the network.FS-DANI is compared with univariate feature selection methods(Relief F,ERGS),multivariate feature selection methods(m RMR,SVM-RFE)and feature selection methods based on network analysis(ATSD-DN,INDEED)which are good at dealing with high-dimensional and small-sample data on 10 public datasets.The experimental results show that FS-DANI is superior to the compared algorithms in accuracy,sensitivity and specificity on most datasets.A feature selection algorithm FS-SN based on the sample network is proposed.FS-SN evaluates the feature distinguishing ability by the topological structure of the sample network,which is constructed based on the distance among the samples on the feature.The feature with strong distinguishing ability leads to many connected edges among the samples belonging to the same class and few connected edges among the samples not belonging to the same class in the sample network.FS-SN algorithm remove redundant features by the gravitation between features.FS-SN is compared with univariate feature selection methods(Relief F,ERGS),multivariate feature selection methods(m RMR,SVM-RFE)and feature selection methods based on network analysis(ATSD-DN,INDEED)which are good at dealing with high-dimensional and small-sample data on 10 public datasets.The experimental results show that FS-SN can effectively select the feature subset related to the class label.FS-SN is superior to the compared algorithms in accuracy,sensitivity and specificity on most datasets.Both FS-DANI and FS-SN use network analysis technology to select important features from omics data.FS-DANI constructs biological network based on the correlation between molecules,and comprehensively evaluates the importance of feature based on the feature distinguishing ability and feature influence in the biological network;FS-SN establishes a sample network for each feature according to the distance among the samples on the feature and evaluates the feature distinguishing ability based on the topological structure of the sample network.The experimental results show the effectiveness of the two methods.In the analysis of omics data,network analysis technology is helpful to understand the changes of biomolecules in the process of disease occurrence and development on the network level.
Keywords/Search Tags:Omics data, Feature Selection, Network Analysis
PDF Full Text Request
Related items