Weakly Supervised Protein-protein Interaction Identification Based On Complex Network And Graph Embedding Representation

Posted on:2020-10-29

Degree:Master

Type:Thesis

Country:China

Candidate:Y W Mao

Full Text:PDF

GTID:2370330590972674

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Protein-protein Interaction(PPI)is a very important research direction in the field of biomedicine.Protein-protein interaction is of great significance for the discovery of new drugs and the diagnosis of diseases.The current PPI relationship is mainly stored in the form of literature.With the rapid development of medical literature,querying PPI relations often brings difficulties to relevant researchers.Therefore,how to automatically identify protein-protein interactions from the literature has become an important research topic for many researchers.Commonly used protein-protein interaction recognition algorithms are usually based on supervised learning algorithms.Although this method can achieve better results,it requires a large amount of labeled data,which is often difficult to apply in practice.Therefore,this paper proposes a PPI identification method based on weakly supervised learning.In this paper,we first use a professional database to collect the target protein pairs and all the sentences containing the target protein pairs to construct the signature,and use a small number of interacting protein pairs as a seed set.Then,the feature that can express the text relationship is extracted from each sentence as the lexical pattern,and the lexical pattern is expressed as a vector according to the distributed hypothesis principle.After that,we used some lexical patterns from corpus,which are similar to the seed lexical pattern,to construct the candidate set.Finally,through the evaluation of the candidate set,the protein pairs which are higher than the threshold are selected and added to the seed set.And the above process is iterated,and the interaction relationship is recognized by the continuous iterative expansion of the seed set.This method only needs a small amount of label data to achieve better results,and the F-score is up to 67.35%.Next,because the weakly supervised method may introduce some noise protein pairs that are not related to the seed set during each iteration,that is,semantic drift problem.We propose to use the complex network model to further evaluate the candidate sets,effectively reducing noise in each iteration,alleviating semantic drift problems.The accuracy of this method is obviously improved on the weakly supervised basic model,and the F-score is also improved.The highest F-score can reach 68.14%.Finally,this paper proposes to generate a lexical pattern vector using the graph embedding method.The method can effectively combine the word information contained in the traditional one-hot representation,and the semantic relationship information contained in the representation based on the distributed hypothesis method to achieve a better representation.The experimental results show that this new representation method effectively improves the accuracy,recall and F-score of the PPI identification algorithm.When the F-score is the highest,the three evaluation values are 70.96%,71.00%,70.98%,and the performance of the model is obvious improved.

Keywords/Search Tags:

Protein-Protein interaction, weakly supervised learning, lexical pattern, key word, complex network, graph embedding

PDF Full Text Request

Related items

1	Protein Complex Detection In Human PPI Networks Based On Supervised Learning Method
2	Research On Protein Complexes Recognition Algorithm Based On Supervised Learning
3	The Study Of Analysis And Application Of Protein-protein Interaction Data Based On Graph And Complex Networks Theory
4	Protein Complex Detection Based On Data Integration And Supervised Learning Method
5	Research On Biological Network Alignment Based On Network Embedding
6	Protein Function Prediction Based On Deep Learning And Dynamic Word Embedding
7	Research On Prediction Of Protein-protein Interactions Based On Deep Neural Network And Ensemble Learning
8	Research On Technologies In Protein Protein Interaction Text Mining Based On Discriminative Models
9	Research On Identification And Application Of Protein Complexes In Protein-Protein Interaction Networks
10	Research On Algorithm Of Identifying Protein Complexes Based On Protein-protein Interaction Network