Font Size: a A A

Research On Predicting Subcellular Localization Of Apoptosis Proteins Based On Machine Learning

Posted on:2018-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:H LiFull Text:PDF
GTID:2310330512989523Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Apoptosis,or called programmed cell death,is the final stage of cell life.It is an important part of many biological processes and plays an important role in maintaining biological tissue balance.Apoptosis proteins play the part of the mechanism of programmed cell death,and knowing the subcellular locations of apoptosis proteins is useful to helps us to understand the mechanism of apoptosis.With the exponential growth of the proteins,proteins annotated by biological experiments can't meet researchers' demand.More and more researchers choose the machine learning methods to predict protein subcellular localization.In this dissertation,research on machine learning prediction method of apoptosis protein subcellular localization is made in depth.The main contributions are summarized as follows:(1)In order to solve the problem that the feature extraction methods based on sequence information can't improve the prediction accuracy,the GO annotation information of proteins and its homologous proteins instead of sequence information is used to describe the protein in this dissertation.The experimental results show that the proposed method significantly performs better than other existing methods.In addition,the online prediction web website is provided for the researchers.(2)There is a serious unbalanced distribution in the CL317 dataset.Previous studies in the machine-learning field have shown that direct application of traditional machine-learning algorithms tends to result in a bias toward the majority class,which results in poor classification performance on the minority classes.The predictor called GOIL-Apo is developed to address this problem,which combines random under-samplings technology with the multi-class SVMs,and constructs the GO subspace for handling high-dimension problem.The experimental results show that solving the imbalance problem can help to improve the prediction effect,and the prediction performance is significantly more than that of other existing methods.(3)Researchers mainly focus on predicting the subcellular location of apoptosis proteins with one location and neglects the apoptosis proteins with multiple sites.This dissertation further studies the prediction of subcellular localization of apoptosis proteins with multiple sites.A new stringent benchmark dataset is constructed in this dissertation which contains apoptosis proteins with multiple locations.Meanwhile,a novel prediction method is proposed,which utilizes label-specific features.Experimental results show that by selecting the most relevant features for each location,the proposed method can model multi-label characteristic of proteins well,and thus achieve superior performance.This study is the first to deal with proteins with multiple locations,and thus provides an important reference for the prediction of subcellular localization of apoptosis proteins with multiple locations.
Keywords/Search Tags:Apoptosis proteins, Subcellular localization, Machine learning, Data imbalance
PDF Full Text Request
Related items