Font Size: a A A

Prediction Of Long Non-coding RNA Subcellular Localization

Posted on:2023-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:D X YanFull Text:PDF
GTID:2530306791456854Subject:Physics
Abstract/Summary:PDF Full Text Request
With the rapid development of functional genomics,the study of the function of non-coding RNA(nc RNA)transcription products has attracted increasing attention,especially the study of long non-coding RNA(lnc RNA),which accounts for a relatively high proportion of them.It has been shown that the functions of lnc RNA are involved in almost all physiological and pathological processes of organisms,and the biological functions of lnc RNAs are closely related to their subcellular locations.The identification of subcellular location of lnc RNA by experimental methods is not only time-consuming and labor-intensive,but also faces the trend of dramatic increase in data volume.Therefore,it is particularly urgent and necessary to use bioinformatics methods to rapidly and effectively predict lnc RNA subcellular location.At present,more and more experimental studies have shown that lnc RNA can also be located at multiple locations in cells,and it will perform corresponding biological functions at different subcellular locations.Therefore,the accurate identification of multiple subcellular locations of lnc RNA can provide a deeper understanding of its biological functions.In this paper,both single-localization and multi-localization problems of lnc RNA have been analyzed and predicted.The main research contents and conclusions are as follows:Firstly,a dataset containing single and double localization sequences of lnc RNA was constructed.The k-mer information representing the local sequence information and the sequence order correlation factor(SOCF)information representing the global sequence information of lnc RNA were extracted.The features were screened by the analysis of variance(ANOVA)and predicted by the SVM algorithm in the five-fold cross test.After feature fusion,the prediction results are as follows: Coverage is87.22%,Accuracy is 84.02%,Absolute_True is 77.41%,and the model achieves good prediction effect.Secondly,an exosome-related lnc RNA dataset was constructed,which contained two subcellular locations located in the nucleus and cytoplasm.The k-mer information of lnc RNA sequence,the k-mer information of purine/pyrimidine reduction,the k-mer information of strong/weak bond reduction,three kinds of three reading frame information and the secondary structure information of lnc RNA sequence were extracted.Based on the balance optimization of the above feature information using SMOTE method,the minimum increment of diversity algorithm is used to reduce the dimension of high-dimensional features into 2-dimensional features.Then,based on single feature and multi-feature fusion,lnc RNA subcellular locations were predicted by using SVM algorithm.After selecting 8-mer information and two reduced 16-mer information fusion,the prediction accuracy is 98.07% in Jackknife test.In addition,the position energy correlation function algorithm and the fusion algorithm of the position energy correlation function algorithms combined SVM algorithm were applied to predict lnc RNA subcellular localization,and the prediction accuracy was 97.68% and 98.34% respectively in Jackknife test,which achieved good prediction effect.The conclusions and methods adopted in this paper provide a theoretical basis for better identifying lnc RNA subcellular locations and exploring the functions of lnc RNA.
Keywords/Search Tags:long non-coding RNA, subcellular multi-localization, increment of diversity, support vector machine, positional energy correlation function
PDF Full Text Request
Related items