Font Size: a A A

Key Protein Identification Studies Based On Ensemble Learning

Posted on:2021-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2510306200453534Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Essential proteins are indispensable for cell survival.The identification of essential proteins can enhance our understanding of how a cell works,and also plays a vital role in the research fields of biology and drug design.Currently,researchers have proposed some machine learning methods and ensemble learning methods to identify essential proteins by introducing effective protein features.The effective protein features are the network topology features extracted from the protein-protein interaction network data and some other biological information.However,these ensemble learning methods only combine multiple base classifiers by simple average weighting,and their research content is mainly in the selection of base classifiers.Therefore,using ensemble learning algorithms to identify essential proteins still has a significant research value.In this thesis,we have developed a new ensemble learning framework,Multi-ensemble,to identify essential proteins.This framework adopts the idea of??multi-view learning,and improves the recognition performance of the framework by integrating multiple different base classifiers.The training samples of the base classifier in the model are not fixed,but are determined by other base classifiers.The base classifier is trained by continuously adding the samples that are predicted correctly by the other base classifiers.At the same time,a new logistic regression classifier is used to integrate the multiple base classifiers to obtain the final prediction result.We applied experiments on Saccharomyces cerevisiae(Yeast)data and E.coli data.The results show that compared with individual classifiers and other ensemble learning methods,the proposed method achieves better recognition results.At the same time,in order to better identify essential proteins,this thesis adds feature extraction work based on the Multi-ensemble model.Because capsule neural networks can extract spatial feature,this thesis uses capsule neural networks to extract16-dimensional feature vectors as enhanced features,which are combined with the original feature data as input data for the Multi-ensemble model.The experimental results show that after adding the enhanced features,the SN and F-Score of the model are all improved by 12%,the accuracy rate is increased by 5%,and the AUC isincreased by 8%,which can effectively identify essential proteins.
Keywords/Search Tags:Essential proteins, ensemble learning, multi-view learning, feature extraction, enhanced features, capsule neural network
PDF Full Text Request
Related items