| With the rapid development of gene sequencing techniques such as RNA-seq(High Throughput Transcriptome sequencing),researchers have determined all human RNA sequences.There are constantly repeated sequence fragments in a large number of human RNA sequences.These highly similar and repetitive sequence fragments are called pattern(motif)in RNA sequences.Each RNA sequence pattern performs similar biological functions and contains a large amount of valuable biological information.Because of the huge amount of RNA sequence data,how to systematically mine sequence patterns from the sequence and how to predict the biological function of patterns is a difficult problem.In this paper,we design an algorithm to mine the relationship between RNA,find out the fragments of RNA patterns they share,and then predict the biological function of RNA sequence patterns through a series of statistical analysis.Generally speaking,the work can be divided into the following two aspects:1.Pattern recognition method based on RNA sequence similarity.Because some similar RNA sequence patterns are shared among RNA,its statistical significance is much higher than that of random string sequences.This paper uses GENCODE database containing all human RNA as data set,and designs recursive pairwise sequence alignment algorithm(RAP algorithm),RNA similar fragment clustering algorithm and RNA pattern de-redundancy greedy algorithm to mine those pattern fragments that occur frequently in RNA sequences.2.Predicting the biological function of RNA sequence patterns.After finding the frequent patterns in RNA sequences through pattern recognition of RNA sequences,combined with GENCODE,Uni Port KB,Gene Ontology bioinformatics databases and related toolsets,an algorithm framework for mapping from RNA sequence patterns to gene functions is designed,and the biological functions of RNA sequence patterns are predicted by means of statistical analysis.Systematic experiments show that the proposed method achieves the set goal,identifies a total of 218 RNA motif,and predicts reliably the biological function of several RNA motif. |