Font Size: a A A

The Prediction Of Fungi And Rice MicroRNA Based On Random Forest Algorithm

Posted on:2021-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ZhangFull Text:PDF
GTID:2370330611483301Subject:Physical chemistry
Abstract/Summary:PDF Full Text Request
MicroRNA is a type of endogenous single-stranded non-coding small RNA with a length of about 20 to 24 nucleotides.It has a high degree of evolutionary conservation and expression specificity,and plays an extensive and important regulatory role in various physiological and pathological processes in the organism.So far,scientists have detected tens of thousands of microRNAs in organisms such as animals,plants,viruses,and fungi.However,there are still a large number of unknown microRNAs waiting for people to discover.So,the discovery and recognition of more new microRNAs will help people to carry out deeper and more comprehensive research and analysis on its function and its regulatory role in complex biological processes.The discovery of new microRNA mainly includes two types of methods: biological experiment and computational prediction.Although the former is more direct and accurate,it has a long experimental period and high cost,and it is difficult to clone microRNA expressed in specific tissues and specific periods.However,computational prediction methods can make up for the lack of experimental methods.In recent years,with the fusion and development of bioinformatics and machine learning,computational prediction methods based on machine learning have become the current research hotspot.Therefore,in this paper,based on bioinformatics and random forest algorithm in machine learning,two computational models,mil RNApredictor and plant Mirp-rice,were constructed to predict and identify microRNA of fungi and rice,respectively.The specific research results are summarized as follows:(1)The construction of the mil RNApredictor.In this study,we combined the k-mer scheme and the distance-dependent potential to construct 26 knowledge-based energy features and 80 k-mer features,and trained a model for de novo prediction of fungi mil RNA based on the random forest algorithm.The model does not require reference genome and mil RNA precursor sequences.The results show that the AUC values of 4-,6-,8-,and10-fold cross-validations are 0.8324,0.8324,0.8335,and 0.8362,respectively,which indicates that mil RNApredictor is very robust.(2)The construction of plant Mir P-rice.In this study,a total of 83 features were extracted to train the prediction model based on random forest,which was specifically used to predict microRNA precursors(pre-mi RNA)in rice.These features include 34 new knowledge-based energy features and 49 existing ones.The results showed that plant Mirprice had a good prediction performance for pre-mi RNA in rice,with an accuracy of 93.48 %.In addition,the performance of plant Mirp-rice in predicting the classification of premi RNA in plants is better than the existing prediction tools.
Keywords/Search Tags:microRNA, milRNA, Random Forest, Fungi, Rice, Computational Prediction
PDF Full Text Request
Related items