Font Size: a A A

Research On Isoform-isoform Interactions Prediction Based On Deep Multi-instance Learning

Posted on:2022-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:J ZengFull Text:PDF
GTID:2480306530498134Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Protein plays a crucial role in regulating and executing the life activities of cells.Only a small number of proteins work independently,while most proteins usually plays their functions by interacting with other proteins and molecules.Constructing and analyzing large-scale protein interaction networks is not only helpful to understand the mechanism of molecular interactions,but also a key way to explore protein functions.Although significant progress has been made in the study of protein-protein interactions,the existing research still recorded at the gene level.“Proteins” in protein interactions generally refer to the longest(or canonical)proteins produced by genes through alternative splicing,which ignores the effects of alternative splicing.Alternative splicing is a very common mechanism of gene expression and regulation.It allows a gene to generate one or more different alternative splicing isoforms through multiple alternatively splicing mechanism,and finally translated into different proteoforms.Alternative splicing can directly or indirectly affect the interaction between proteins by changing the structure of the protein and the composition of the domain,thereby changing the protein-protein interactions.Comprehensive analyzing of the interaction networks of isoforms is essential for understanding of moleculae interactions and exploring protein functions.Traditional gene-level networks cannot be directly extended to construct isoform-level networks for the two reasons: 1)Most traditional genomic data cannot directly provide the isoform-level features;2)Lacking of a large number of experimentally validated interacting isoform pairs to as a “gold stand” to model and evaluate the computational methods.In recent years,the development of RNA-Seq technologies has provided the multiple types of data at the isoform level,making it possible to predict the isoform-level interactions by integrating multiple isoform-level features.This paper aims to effectively integrate multiple data and improve the accuracy of predicting isoform interactions.In addition,a deep multi-instance learning framework is proposed to predict the interactions between isoforms.The main contents in this paper are described as follows:(1)Due to the lack of experimentally verified interactions of isoforms,some existing research methods only consider the case that one gene produces one alternative splicing isoform.To solve the above problems,this paper proposed a deep multi-instance learning method for predicting isoform-isoform interactions,named DMIL-Ⅲ.DMIL-Ⅲ comprehensively considers the case that a gene with one and more alternative splicing isoforms,and models the problem of predicting isoform-isoform interactions as a multiinstance learning problem.In the multi-instance learning framework,a pair of genes is regarded as a "bag",and different alternatively spliced isoform pairs from the two genes are regarded as different "instance" in the "bag".In addition,a variety of different types of biological data such as RNA-Seq,nucleotide sequence,domain-domain interaction,and exon array are fused to describe different isoform pairs.The DMIL-Ⅲ model takes the feature of a gene "bag" as input,and employed the convolutional neural networks to capture the complex feature of different isoform pairs from the same gene bag.Then,the interaction probability of each isoform pair is calculated based on the extracted features.Due to the lack of isoform-level interactions,DMIL-Ⅲ utilized the multi-instance learning hypothesis to map the isoform-level prediction to gene pairs for training and evaluation.Experimental results show that,compared with the existing methods on predicting isoform-isoform interactions,DMIL-Ⅲ can achieve a significant result by integrating different types of data and employing the deep convolutional neural networks to extract key features.(2)In the multi-instance learning methods of isoform-isoform interaction prediction,the gene-level interaction data is generally served as the "gold standard" to construct and evaluate the model.In the "gold standard" data set,interacting gene pairs can be obtained from existing databases,while gene pairs that do not interact are generally generated through different subcellular locations.As a result,the number of gene pairs with interaction is much smaller than the number of gene pairs without interaction,which leads to the problem of class imbalance.Existing studies on predicting isoform-isoform interactions have not considered the problem of class imbalance.To overcome the problem,this paper proposed an imbalanced deep multi-instance learning method for isoform-isoform interactions prediction(IDMIL-Ⅲ).IDMIL-Ⅲ also models the problem of predicting isoform-level interactions as a multi-instance learning problem,and it integrates RNA-Seq,nucleotide sequence,amino acid sequence,and exon array data to describe different isoforms.First,IDMIL-Ⅲ employed a convolutional neural network to extract the feature representations of different isoform pairs from the same gene pair.At the same time,the IDMIL-Ⅲ model introduced an attention model to calculate the weight of each isoform pair.By calculating the element-wise product of the weight and feature representation for each isoform pair,an attentive feature map was obtained.Then,IDMIL-Ⅲ utilized a convolutional layer to extract features from the obtained attentive feature maps,and calculates the interacting probability for each isoform pair.Next,IDMIL-Ⅲ adopted the multi-instance learning hypothesis to map the isoform-level prediction to the gene level.Taking into account the problem of class imbalance at the genetic level,a novel loss function is proposed to reduce the impact of major samples during model training.Experimental results show that IDMIL-Ⅲ is effective on handling class imbalance problem.Meanwhile,the introduction of attention mechanism is helpful to improve the accuracy of isoform interactions predictions.
Keywords/Search Tags:Isoform-isoform interactions prediction, Data fusion, Deep multi-instance learning, Class imbalance
PDF Full Text Request
Related items