Font Size: a A A

Research On Deep Learning Model Of Moonlighting Protein And LncRNA Prediction Based On Multi-source Heterogeneous Feature Fusion

Posted on:2022-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:J N ZhaoFull Text:PDF
GTID:2480306329498934Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Moonlighting proteins(MPs),also called multitasking proteins,are special proteins with more than two distinct functions.Studies have found that MPs play an important role in cell regulation,disease mechanism,biological evolution,metabolic mechanism and other biological processes.In recent years,research on MPs has begun to arouse people's attention.There have been some experiments to verify MPs in the collection database.In addition,the functional research of long-chain non-coding RNA(lnc RNA)is also a hot research field in biology.lnc RNA is a base sequence whose nucleotide length is greater than 200 nt and does not participate in the coding of a protein.Although lnc RNA does not encode protein,lnc RNA is involved in many biological processes,including transcriptional silencing,chromosome modification,and nuclear transport.Therefore,the research on the function of lnc RNA plays an important role to the discover and understand many biological mechanisms.Similar as MPs,there is also a type of lnc RNA with multiple functions,called Moonlighting lnc RNA(Mlncs).Mlncs have been inferred by existing studies to have very important roles,such as subcellular localization,epigenetics,etc.,and Mlncs are also considered to be of great significance to the treatment of cancer.Compared with the research on MPs that has attracted people's attention in recent years,the research on Mlncs is still in its infancy,and there is no database to collect Mlncs verified by experiments.Therefore,our work focuses on the construction of MPs and Mlncs predictive models.The existing computational models for MPs prediction need to provide annotation information related to multiple associations,expressions and functions of proteins.However,the relevant annotation information of many proteins is missing,which greatly limits the large-scale application of existing algorithms.Therefore,we have developed a sequence-based multimodal deep ensemble learning architecture for MPs de novo predictive model named MEL-MP(Multimodal Ensemble Learning for predicting MPs).It mainly includes 1)extraction of sequence-based multi-modal features(Autocorrelation characteristics of sequence composition frequency,evolution information,physical and chemical properties,and secondary structure information);2)feature learning and predictive sub-model construction of different modal features;3)Construct a deep learning architecture for multi-modal feature fusion;4)Compared with existing multi-modal feature fusion methods and existing MPs prediction tools based on calculation methods,MEL-MP has the best prediction performance.The F1 value reached 0.892 using the ten-fold cross-validation.On the other hand,in order to further verify the validity and practicability of our model,we use MEL-MP to identify MPs on a human genome-wide scale,and further analyze four different perspectives from chromosome distribution,disease association,evolutionary history,and functional analysis.Conducted in-depth mining and analysis of MPs predicted by our model,and verified the validity of our prediction results from a biological point of view.In addition,in order to facilitate the use of users,we have developed a web server of MEL-MP.For the prediction of Mlncs,the existing work is extremely limited,and it is based on network analysis.Since the interaction between lnc RNA and protein(LPI)is one of the most important and complex molecular mechanisms of lnc RNA,the characteristics of proteins interacting with lncRNAs are of great significance for inferring the molecular mechanism of lnc RNA and understanding its functions.LPI information is one of the most important ways to understand and infer the function of lnc RNA,for this reason,the second part of our work is to construct a prediction model of Mlncs based on the information of lnc RNA-protein interaction.The main work is to integrate the GO functions of lncRNAs interacting proteins and the Moonlighting properties of MEL-MP proteins based on the multi-modal deep integration of MPs prediction method.Two methods are used to unsupervised identification of human Mlncs,including the similarity of functional annotations.Clustering algorithm and enrichment analysis algorithm based on the association of lnc RNA-MPs.The Mlncs identified by the two methods are integrated and the union is taken as the final Mlncs prediction result.Finally,based on the Mlncs samples we predicted,we built a one-class deep learning model for identifying Mlncs.Which proved that our work provides a large number of candidate references for subsequent research on Mlncs.The deep learning model based on multi-source heterogeneous information fusion proposed in this paper has played an important role in promoting the identification and research of MPs and Mlncs,so as to provide more model and data support for further research on MPs and Mlncs in the future.
Keywords/Search Tags:Moonlighting Proteins, Moonlighting Long Non-coding lncRNAs, Deep Learning, Ensemble Learning, Multi-source Heterogeneous Features, De novo Prediction
PDF Full Text Request
Related items