Font Size: a A A

RNA Secondary Structure Shape-base Pair Distance And Semi-supervised Clustering Algorithm

Posted on:2020-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:J H LiFull Text:PDF
GTID:2370330599460539Subject:Engineering
Abstract/Summary:PDF Full Text Request
RNA one of the indispensable large molecules in a living body,not only plays a decisive role in the translation of genetic information,but also boasts the functions of enzyme catalysis,cell regulation,viral genetic information carrying,etc.RNA spatial structure is critical to RNA functional diversity as RNAs with different spatial structures differ in function.RNA secondary structure decides the formation of RNA spatial structure.Real RNA secondary structure exists in the set of RNA secondary structures within an energy range above the minimum free energy.Therefore,it is of great significance to predict the real RNA structure by calculating the distance between RNA secondary structures,then dividing the set of RNA secondary structures by clustering algorithm,and screening the representative structures of each cluster for further research.In this paper,RNA secondary structure distance calculation algorithm and RNA secondary structure clustering algorithm are studied.First,this paper presents Rsd-bp,a RNA shape-base pair distance calculation algorithm,to solve the problems of bigger error and lacking calculation basis diversification in existing algorithm for the calculation of distance between RNA secondary structures.The first step of this algorithm is to work out the shape-distance of RNA secondary structure RNA secondary structure is abstracted as a signed ordered tree and shape-distance is calculated through editing operations like conversion or deleting.The second step combines the shape-distance calculation and base pair calculation,computing the average score of the results of these two algorithms using normalization.Computation efficiency gets lifted in the third step by conducting multi-process optimization on Rsd-bp algorithm.Second,traditional algorithm initializes medoids through random selection,which may only result in local optimal solution.Therefore,semi-supervised k-medoids algorithm for RNA secondary structure is put forward,which computes the distance matrix and the constraint set of RNA secondary structure based on Rsd-bp algorithm and preprocesses the constraint set to acquire the supervision information.Medoid initialization and data classification are performed with the help of supervision information.Medoid updating rules of k-medoids are improved to narrow down the data searching scope while updating the medoids.Finally,the feasibility of the two algorithms,Rsd-bp and SS-medoids~+,is validated through two comparative experiments.Distinguishing capability and computing efficiency of Rsd-bp in calculating RNA secondary structure distance is tested in the first experiment,while SS-medoids~+clustering experiment is conducted on the set of RNA secondary structures in the second experiment.SS-medoids~+is validated in terms of clustering quality and time by computing the clustering evaluation index.
Keywords/Search Tags:RNA secondary structure, tree model, distance calculation, semi-supervised clustering, k-medoids, Rsd-bp
PDF Full Text Request
Related items