Font Size: a A A

Research On Pairwise Biological Network Alignment Based On Sequence And Topology

Posted on:2023-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2530306818495254Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The development of high-throughput technology has promoted the generation of a large number of protein-protein interaction network data.Protein-protein interaction network determines most cell functions.The study of protein-protein interaction network is helpful to understand various biological processes and life activity rules from a systematic perspective.One of the most important and extensive research work is to analyze the protein-protein interaction network through alignment.Through network alignment,we can find the network conserved function module,study the protein function,and find the relationship of biological evolution.Network alignment is a NP hard problem.The global alignment of protein-protein interaction network looks for protein homology by maximizing the similarity of alignment results,but a difficulty of global alignment is how to match similar proteins.Research shows that the sequence similarity of proteins can be used to determine the homology of proteins,so adding it to the similarity calculation process will help to obtain more homologous proteins,but the sequence information is incomplete,therefore,this paper calculates the similarity between nodes by means of topology and sequence combination to guide the generation of alignment results.The work contents of this paper are as follows:1.In order to find the approximate optimal solution of the network alignment problem,Bat Align,a network alignment algorithm based on discrete bat algorithm,is used to conduct experiments in synthetic networks and real networks,and compared with four algorithms with good results.The quality of alignment results is measured by topological indexes and biological indexes respectively.The biological significance of the alignment results is further analyzed based on the protein function in the alignment results.The experimental results show that Bat Align algorithm can obtain high biological quality alignment results and identify homologous proteins in the network.2.In order to improve the consistency of topological and biological quality of alignment results,a population optimization network alignment algorithm PONAL integrating local topology and sequence information is proposed,which is improved on the basis of Bat Align algorithm.Firstly,to solve the problem that Bat Align is too dependent on sequence similarity,the similarity calculation method is improved to improve the topological quality of alignment results;Secondly,in order to enrich the population diversity and improve the quality of the initial population,the generation method of the initial population is changed to ensure that the alignment relationship of each node is obtained under the guidance of similarity;Thirdly,use the objective function of the combination of conserved edges and conserved nodes,increase the number of conserved edges and consider the biological characteristics of conserved nodes to improve the quality of alignment results;Fourth,in order to improve the quality of new individuals,the method of global search and local random search is used.The experimental results show that PONAL algorithm can identify the subgraphs with conservative structure in the network,and ensure the consistency of topological and biological quality.3.In order to further improve the alignment quality and optimize the alignment efficiency of PONAL algorithm,a network alignment algorithm CCSNA based on clustering coefficient and sequence information is proposed.The CCSNA algorithm is completed in two steps.The first step is to calculate the node similarity,and the second step is to use the search algorithm to generate the alignment result.In the first step,firstly,the node importance is calculated by combining the node clustering coefficient and node degree,and then the initial similarity of nodes is calculated by combining the node importance and node sequence information;In the second step,the interaction score of the node is calculated,and the interaction score is added to the initial similarity to obtain the alignment similarity.The pair of nodes with the highest score in the alignment similarity is greedily selected as the seed node pair and expanded,and increase the alignment similarity of other neighbor nodes and increase the alignment probability of neighbor nodes.This method can improve the topological quality.Continue to select and expand the seed node pair according to the updated alignment similarity until all nodes of the source network complete the alignment.The experimental results show that the CCSNA algorithm performs best and can obtain the alignment results with high topological and biological quality.
Keywords/Search Tags:Protein-protein interaction network, Global network alignment, Bat algorithm, Local topology, Clustering coefficient
PDF Full Text Request
Related items