Font Size: a A A

Biological Sequence Analysis And Function Prediction

Posted on:2014-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:W Y ZhangFull Text:PDF
GTID:2260330401986004Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Sequence analysis and function prediction in bioinformatics are becoming more andmore important with the completion of gene sequencing. Eukaryotic promoter predictionis a significant composition in the research of DNA sequence analysis. ncRNA plays acrucial role as the protein in the process of biological development. So the promoteridentification and ncRNA prediction play a vital role in the interpretation of the entiregenome.Based on the following features of promoters:(1)Promoter regions include someconsistent sequences,however, consistent sequence has diversity because of nucleotidevariation for different promoters;(2)Positions of consistent sequence are notfixed,instead,their positions are actually more likely to fluctuate in an approximateregion;(3) Most of the eukaryotic promoters are related with CpG island.A new methodis presented for promoter prediction which adopts a new statistical modeling,and it is thefirst time to present a new concept “Interval Position Weight Matrix”probabilitymodel.Experimental results on large sequences show that the new promoter predictionsystem is efficient with higher sensitivity and specificity.A new kind of method is proposed in this paper that needs to calculate the Z-scorevalue of an unknown sequence to predict ncRNA.The prediction sequence must bedisturbed1000times with the shuffle procedure to calculate the Z-score values.That willwaste a lot of time and make procedure slow. In order to solve this problem, firstly,a largenumber of sample sets, inclouding about10648training sequences are generated by thematlab.Secondly,compute the Z-score values for every training sequence. We can predictan unknown sequence by support vector machine (SVM) regression analysis of machinelearning.It can greatly improve the prediction speed in this way. Experimental results showthat the algorithm is efficient with better prediction effect and a little of time.The first promoter prediction algorithm proposed in this paper, that is based on theconsensus sequence diversity analysis. Experimental results on large sequences and human chromosome22sequences show that this method has good sensitivity andspecificity. The second algorithm, regression analysis of non-coding RNA gene predictionhas good prediction effect for an unknown sequence.
Keywords/Search Tags:promoter prediction, CpG island, Interval Position Weight Matrix, Z-scorevalues, non-coding RNA gene prediction, regression analysis
PDF Full Text Request
Related items