Font Size: a A A

Support Vector Machine And Codon Usage For Sequence Recognition

Posted on:2007-06-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:T ZhouFull Text:PDF
GTID:1104360212465665Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
With the achievement of genome project of human and some other model organisms, the amount of available biological data in public databases grows more and more rapidly. How can we learn biological information from these raw data? It has been an urgent problem in genome project.In this paper, synonymous codon usage of genes in influenza A viruses, chlamydiae and yeast is analyzed. It is found that codon usage is influenced by several factors. Although genomic base composition and gene expression level are thought to be the most dominant factors which can affect codon usage, other factors such as strand-specific mutational bias, hydropathy level of corresponding protein, gene function and meiotic recombination rate are also related to codon usage variation.It is assumed that codon usage is alterable in different regions of a given gene. The synonymous codon usage in the translational initiation and termination regions of genes in yeast and Coronavirus is analyzed. It is found that most minor codons are preferentially used in the translational initiation region, which is thought to have a negative effect on gene expression and can be explained by the'minor codon modulator hypothesis'. Besides, minor codons are observed to be preferentially used in the terminal regions of genes in Coronavirus, which may also regulate the level of gene expression.Based on the result of codon usage analysis, support vector machine (SVM) is applied to solve several hot problems in bioinformatics. First, the information of nucleotide sequence is firstly used to recognize the family of G-protein coupled receptors, which leads to a high prediction accuracy. Second, a novel SVM method is presented for classification of meiotic recombination hot and cold ORFs located in hotspots and coldspots respectively in Saccharomyces cerevisiae, which relies on codon composition differences. Moreover, it is found that there is a considerable correlation between meiotic recombination rate and amino acid composition of certain residues, which probably reflects the structural and functional dissimilarity between the hot and cold groups. Third, the prediction of the horizontally transferred genes is improved by a SVM based algorithm which deals with the genes on the leading strand and the lagging strand separately. In addition, a small interfering RNA (siRNA) efficacy prediction algorithm is developed by using SVM with dinucleotide composition as sequence attribute. This algorithm achieves a better performance than several previous published methods.
Keywords/Search Tags:Bioinformatics, Codon usage, Support vector machine, G-protein coupled receptor, Meiotic recombination, Horizontal gene transfer, RNA interference
PDF Full Text Request
Related items