Font Size: a A A

The Study Of Protein Complex Detection Algorithm Based On PPI

Posted on:2018-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:J W JiangFull Text:PDF
GTID:2310330542483633Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
A protein complex is an organization in which multiple proteins are united together.Life events are driven by these complexes,so the study of the specific roles of pro-tein complexes is of great importance in biology.However,the identification of these complexes in current life sciences requires considerable time consumption and machine costs.But there are so many protein-protein interaction networks.The complexes can be viewed as communities in these networks.Some algorithms for community detection in data mining can help identify these complexes in order to reduce the costs.Therefore,complex detection based on protein-protein interaction networks is a significant research topic.Existing protein complexes detection algorithms are usually based on graph network topology,and are combined with data mining methods such as subgraph mining or feature learning to identify these complexes.However,these algorithms often have difficulty in fully considering the complexity of protein complexes.Some algorithms can only consider the complexes in the dense region and ignore the complex in the sparse region.Some algorithms easily lose some of the proteins in the complex or produce redundant proteins.Therefore,in this paper,two more efficient algorithms are developed to detect protein complexes based on previous studies:The first algorithm is based on the basic idea of finding seed clusters and expand-ing outwards.In order to get the seed cluster effectively,Random Walk with Restart(RWR)is used in the algorithm because it can get the global relevances between all nodes.The nodes that connect some starting node closely are combined together to form the seed clusters.And some redundancies of these seed clusters are dropped.Then,the algorithm starts from these seed clusters and expands outwards into the final composite.In this process,in order to solve some special edge protein,combining the advantages and disadvantages of other beneficial expansion formulas,proposed a new comparison method to control the size of the complex.At the same time,on the basis of the expanded complexes,the complexes which are too similar are merged together,but some special small complexes will be retained.The second algorithm is also based on the finding the seed and expansion theory.Seed clusters are generated by the same method RWR with certain scale by a special threshold setting.Then,this algorithm considers the largest seed cluster in some region can be regarded as the center of this part.So,other seed clusters which have common nodes with the biggest are deleted.Finally,some real representative seed clusters are obtained.In the process,many rest nodes are generated.Based on the result of RWR,the average values of the relevances between the rest nodes with all nodes in the real seed clusters are compared.The rest nodes belong to the seed cluster which has the biggest average value with them.Based on experiments on several well-known yeast protein-protein interaction net-works,the two algorithms presented in this paper all exhibit a certain capacity for com-plex detection.
Keywords/Search Tags:protein-protein interaction, complex detection, Random Walk with Restart
PDF Full Text Request
Related items