| With the completion of the human genome draft, the research of genomics has entered the era of functional genomics, and the most challenging task today is how to find the genes and their regulatory network. As a important component controlling a gene’s transcription initiation and its transcription rate, a promoter plays a very important role in the regulation mechanism of gene expression. The identification of promoters is one of the key problems for finding genes. Because eukaryotic promoters have close relationship with human and mankind activities, eukaryotic promoter recognition has become a hot research field.In eukaryotic promoter recognition technology, mammals (human and mouse) promoter recognition has obtained many important results, but the identification of plant promoters, a important sort of eukaryotic promoters, is still in the initial stage. And now there are less papers about plant promoter recognition. One of the reasons is lack of validated promoters. Recently, with the completion of the perfect plant database, plant promoter recognition has become one of the hot research topics in bioinformatics, and lower specificity is one of the problem.The existing plant promoter recognition algorithms are analyzed based on a lot of literatures in chinese and english. Aiming at the problem of the lower specificity, two novel plant promoter recognition algorithms are proposed in this thesis.A novel plant promoter recognition algorithm based on GC-Skew and support vector machine(SVM) is proposed. The algorithm’s trait is to make full use of the GC-Skew feature of plant promoters and excellent performance of SVM classifier. Firstly, DNA sequences are classified as either GC-skews or non GC-skew sequences by analyzing the contents of base G and C; and then the structure and signal features are extracted in the classified sequences; finally, SVM classifier is used to recognize the promoters. The SVM classifier has four SVM sub-classifiers, that is, promoter-3’UTR sub-classifier, promoter-5’UTR sub-classifier, promoter-Intergenic sub-classifier and promoter-CDS sub-classifier. The results of the four SVM sub-classifiers are synthesized to recognize a plant promoter sequence.A novel plant promoter recognition algorithm based on GC-skew and DNA double strand characteristics is proposed. It has the same system architecture as the first algorithm. Its outstanding trait is the combination of GC-skew and DNA double strand features, so the extracted characteristics are more efficient. The experiment results show that proposed two algorithms in the thesis are effectual with higher specificity in identifying plant promoter. |