Font Size: a A A

Research Of Density-based Clustering Algorithm Based On BPP Scores For RNA Suboptimal Secondary Structure

Posted on:2015-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:X Q WangFull Text:PDF
GTID:2180330452954781Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
RNA molecules are one of the indispensable elements in biology, participating in aseries of cells most fundamental processes, including catalysis, RNA splicing, RNAediting, regulation of transcription and translation. The function of an RNA molecule isclosely related to its structure. However, the experimental determination of RNA structureis expensive and time-consuming, and computational approaches of RNA tertiary structureare so far less than optimal. Computational methods for modeling RNA secondarystructure have proven to be valuable toward determination of tertiary structure andfunction of an RNA molecule. The prediction of RNA secondary structure based on freeenergy model produces the problem that the true structure may be a suboptimal structurewithin an energy increment above the minimum free energy. The accuracy of the trueRNA structure prediction can be improved through grouping suboptimal structures into asmall number of clusters and computing representative structures for each cluster. In thispaper, we study clustering algorithms for RNA secondary structure prediction, and theachievements are introduced in the following.Firstly, a density based clustering with extensible radius dubbed ER-DBSCAN isproposed to cluster RNA suboptimal structures, according to the unknown distribution andthe unknown cluster number of RNA structures. Our algorithm selects different initialradiuses for clusters with different densities, and the clustering process starts from thehigher density point towards the lower density point. This method selects the unclassifiedhighest density point as the starting point of a new cluster, the radius of the cluster isautomatically adjusted during cluster expansion according to the density distribution anddensity variation. This method not only allows proper density variations within theclusters, but also detects clusters separated by the regions having different densities.Secondly, this study introduces a density clustering algorithm based on featureselection called RSFS-ER to cluster RNA suboptimal structures. The RSFS-ER algorithmuses cluster ensemble to generate the consensus matrix, which reflects the internalstructure of data sets. It evaluates the importance of each feature for clustering through comparing the consensus matrix and the similarity matrix of each feature. We performER-DBSCAN algorithm on the dataset consisting of the optimal feature subset to ensurethe quality of clustering results.Finally, this study uses the RBP score as a measure of RNA secondary structurecharacteristics and calculates the RBP matrix using the RBP algorithm. ER-DBSCAN andRSFS-ER algorithm are implemented to cluster RNA secondary structures using the RBPmatix as their inputs. And this paper will give the analysis according to the experimentalresults.
Keywords/Search Tags:RNA secondary structure, RNA suboptimal structure, density clusteringalgorithm, extensible radius, feature selection, relaxed base-pair score
PDF Full Text Request
Related items