Font Size: a A A

Research On The Prediction Of LncRNA-protein Interaction Based On Deep Learning Of Multiple Type Features

Posted on:2022-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:S Q JiaoFull Text:PDF
GTID:2480306758492074Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Long noncoding RNA(lncRNA)play a critical role in many key biological processes and participates in complex human diseases through interaction with proteins.Therefore,accurate prediction of lncRNA-protein interactions plays an important role in understanding lncRNA and cellular regulation,gene expression,and the pathogenesis of various diseases.Emerging evidence has shown that the method to identify lncRNA-protein interactions by highthroughput biological experiments is time-consuming and expensive.It is increasingly favored by researchers that find a computationally-based method to predict lncRNA-protein interactions.Existing computational methods for predicting lncRNA-protein interactions can be roughly classified into network-based methods and machine learning-based methods.Network-based methods require at least one link between two nodes in the network.The lncRNA-protein interaction network is composed of a few isolated subnetworks,and the imbalance of the degree distribution of each node in the network will also affect the prediction performance of the network.Machine learning-based methods are dependent on the quality of hand-designed features,and when the feature selection is poor,it will have a great negative effect on the model prediction effect.In this paper,we takes the prediction of lncRNA-protein interactions as the research object and integrates multiple types of features based on deep learning algorithms to predict lncRNA-protein interactions from various aspects.In the study of predicting lncRNA-protein interactions,we propose a multi-type feature deep learning based model(LGFC-CNN).The model achieves a comprehensive prediction of lncRNA-protein interactions by using global sequence features,local sequence features,handdesigned features,and structural features.First,the sequence preprocessing method originally used to predict RNA-protein binding sites was improved,and based on this,the one-hot encoding method and two deep learning modules(GloCNN and LocCNN)were used to encode and extract the raw sequences of lncRNAs and proteins features.Meanwhile,multiple lncRNAprotein hand-designed feature combinations were feed into the random forest classifier for comparison,and the hand-designed features that best represented lncRNAs and proteins were found by analyzing the performance of the feature combinations.After that,the secondary structures,hydrogen bonds,and van der Waals interactions of lncRNAs and proteins were extracted by multiple tools,and the feature sizes were unified as corresponding structural features by Fourier transform.To solve the problem of unreasonable negative samples caused by random pairing,we also design a similarity-based negative sample generation strategy.Finally,four basic modules integrate the final model to comprehensively predict lncRNAprotein interactions.The proposed model is compared with other excellent methods on three lncRNA-protein interaction datasets and achieves 94.14% accuracy on dataset RPI21850,92.94% accuracy on dataset RPI7317,and 98.19% accuracy on RPI1847 which are better than the existing prediction methods.In addition,the effectiveness of the negative sample generation strategy proposed in this paper and the strategy of LGFC-CNN combining multiple types of features are reasonable and effective through multiple sets of comparative experiments.The main contributions of this paper are as follows:(1)A classification model based on deep learning is proposed,and based on this,multi-type features are fused to predict lncRNAprotein interactions,which outperforms traditional machine learning algorithms and other deep learning in performance algorithms.(2)A negative sample generation strategy is proposed to reduce the problem of poor reliability of negative samples caused by random matching to generate negative samples.(3)An RNA sequence preprocessing method was improved so that it could be applied to lncRNA and protein sequences and based on this,high-quality global and local sequence features of lncRNAs and proteins were obtained.(4)Compared with the method to identify lncRNA-protein interactions by high-throughput biological experiments,our model has obvious advantages in time and cost,and the predicted lncRNA-protein interaction is more statistically significant.
Keywords/Search Tags:lncRNA-protein interaction, convolutional neural network, raw sequence features, hand-designed features, structural features
PDF Full Text Request
Related items