Font Size: a A A

Research Of RNA Secondary Structures Similarity Based On Hausdorff And Cosine Metric

Posted on:2017-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:N N AnFull Text:PDF
GTID:2180330503982174Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
There are two purposes in researching the similarity of RNA secondary structures: firstly, it can help researchers understand the RNA sequence, and explore the true conformation of RNA secondary structure; secondly, it can be applied to the prediction of RNA secondary structure as the basis to explore the RNA tertiary structure. Researchers have proposed a variety of algorithms about RNA secondary structures similarity. The pseudoknots have an important function in RNA structures and is the key issue in measuring RNA secondary structure similarity.Firstly, a new method of RNA secondary structure representation is proposed. The representation is the information of abstract RNA secondary structure that can be converted to the effective information which can be recognized and processed by some mathematical formula or computer language. RNA secondary structure coordinates method displays the position of base pair and the whole frame of the RNA secondary structure, and contributes to the use of similar algorithm for information.Secondly, according to the deviation of isolated base pair distance calculation of Haudorff distance, the fold line algorithm based on Hausdorff distance named FL-HD is proposed. The idea of FL-HD algorithm is using max-min value distance, considering the differences of RNA secondary structure spatial structure shape and the relative basepairs position, calculate the distance between the discrete basepairs and the fold lines, measuring the similarity of RNA secondary structures.Thirdly, according to RNA secondary structure contains pesudoknots the paper proposes Cos-RNACompare algorithm based on the space vector of cosine similarity. Each RNA secondary structure is represented by a two-dimensional vector, the cosine between the two-dimensional vectors represents the similarity of RNA secondary structures with pesudoknots.Finally, dividing the set of different lengths and types of RNA secondary structures into three experimental datas. Validating the feasibility of FL-HD algorithm and Cos-RNACompare algorithm through the first two comparative experimental groups separately, and evaluating the effectiveness of Cos-RNACompare algorithm bases on the results distribution of the last experimental group.
Keywords/Search Tags:RNA secondary structure, pseudoknot, Hausdorff distance, cosine similarity
PDF Full Text Request
Related items