Font Size: a A A

Research Of Suitable K-medoids Clustering Algorithm For RNA Secondary Structures Pridiction

Posted on:2015-07-06Degree:MasterType:Thesis
Country:ChinaCandidate:X F LiuFull Text:PDF
GTID:2180330422470502Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As one of the three most important macro molecules, the RNA molecule can affectthe expression of the genes and the synthesis of protein through some specific functions,thus affecting the growth and the development of the living bodies. Researches found thatthe function of the RNA molecule is only related to its structure closely and has noobvious relations to its nucleotide sequence, so researchers are focused on the study of theRNA structure. At present, due to limitations of scientific research, the crucial point for theRNA research is to study the RNA secondary structure. In the field of bioinformatics, it isvery difficult and important to predict the real secondary structure from the suboptimalstructure set accurately and effectively by using the free energy method. Using clustertechniques analyzing the suboptimal structure set and filtering out the representativestructure data of each cluster for further biological experiments can effectively reduce thestructure outputs, ensure the quality of the data and improve the prediction accuracy ofthe results. Therefore,effective cluster algorithm for the prediction of RNA secondarystructure is of great significance.Firstly,an improved k-medoids cluster method IC-kmedoids could be proposed tomake better this accuracy with the incremental candidate set of medoids matrix in thispaper. The algorithm can reduce the computational complexity through an expandingmedoids candidate sets gradually.In this paper,RBP score is used as the feature of eachfolding structure. The initial center points are selected randomly by the algorithm. Afterthe first division, each center candidate set is restricted to the points which is in the samecluster with the center point. And then the candidate set for each center point graduallyincreases from one cluster to all the clusters, which finally is the same as the originalk-medoids algorithm, thus ensuring the quality and efficiency of the cluster results.Secondly, an algorithm ICO-IC-kmedoids based on optimization initial centers isproposed to cluster RNA folding structures. Randomly selecting initial centers can easilylead the result fall into local optimal situation. ICO-IC-kmedoids algorithm defines adistance threshold value and a initial center candidate sets to avoid selecting the adjacent data points and the abnormal data or noise respectively. Based on the optimization initialcenters, this algorithm continue to executive the center replacement of the IC-kmedoidsalgorithm.Finally, the IC-kmedoids algorithm and ICO-IC-kmedoids algorithm areimplemented in the environment of Matlab2010a. The original k-medoids algorithm iscompared and analyzed to verify the effectiveness of the improved method.
Keywords/Search Tags:RNA secondary structure, cluster algorithm, incremental center candidate set, initial center optimization, RBP score
PDF Full Text Request
Related items