Font Size: a A A

Research On Protein Complexes Recognition Algorithm Based On Supervised Learning

Posted on:2022-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z W ZhaoFull Text:PDF
GTID:2480306332457944Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Protein is the important material basis of life activities,and is also the executor and regulator of life activities.A few proteins can perform specific functions in the organism alone,while most proteins accomplish their specific functions in the form of complexes through interactions with other proteins.Therefore,the accurate and efficient identification of protein complexes is of great significance for revealing the principle and functional mechanism of cell tissue,and has a certain guiding role for the diagnosis and targeted therapy of complex diseases.In this paper,based on bioinformatics theory and machine learning algorithm,the problem of protein complex recognition in protein-protein interaction network is studied.At present,the identification methods of protein complexes can be divided into experimental methods and computational methods.The experimental methods require high time cost and economic cost,which is difficult to meet the needs of large-scale application.Therefore,the research on complex recognition algorithms based on computational methods has gradually attracted extensive attention of researchers.According to the different core ideas used in the algorithms,the existing algorithms can be roughly divided into four categories: cluster and density subgraph based method,model based method,hierarchical clustering based method and supervised learning based method.To some extent,these methods can identify protein complex,but there are still inadequate: no noise was the interaction of protein interaction network side give weight,not considering the characteristics of real complex itself in the network structure and complex information,ignored in the process of complex search search efficiency.Aiming at the above deficiencies,this paper proposed a Protein complex Recognition Algorithm Based on Supervised Learning(CRSL).This algorithm integrates the core ideas of the complex recognition method based on supervised learning and the structural information recognition method,and improves the factors that affect the algorithm efficiency.Firstly,CRSL algorithm gives weight to protein interaction edge based on biological information and topological structure information,and constructs protein interaction network with weight.Then,according to the characteristics of protein complexes in the network,a feature matrix with fewer features and more information covering the samples is constructed and used for the training of the supervised learning model.Next,the probability of the current complex subgraph becoming a real complex is rated by using the post-training supervision model and the structure function with penalty term.The process of searching the complex in the network is guided by the rating,and the tabu table is introduced in the process to avoid repeated searching.Finally,the identified compounds are clipped and merged according to the set threshold.In order to verify the effectiveness of the feature matrix construction in the CRSL algorithm,this paper designed experiments to compare the features of the CRSL algorithm with those of other algorithms,and the results show that the feature matrix of the CRSL algorithm has a higher recognition accuracy for the complex.In order to verify the reasonableness of CRSL algorithm in the selection of supervised learning model,this paper selects three models,namely support vector machine,k-nearest neighbor and random forest,which are widely used in supervised learning to carry out comparative experiments.The experimental results show that the random forest model has higher recognition accuracy and stronger stability under the existing eigenmatrix.In comparison with other six protein complex recognition algorithms,the results show that the matching rate of the complex identified by CRSL algorithm is higher than that of the real complex.Compared with other algorithms,this algorithm has better performance,which has a positive role in promoting the research of protein complex recognition algorithm.In addition,the identification method of CRSL algorithm has a certain extended application significance for the identification of community structure in other similar complex networks,which will be the focus of our future research.
Keywords/Search Tags:protein complex, Protein interaction network, Supervised learning, Complex networks
PDF Full Text Request
Related items