Font Size: a A A

Gene Recognition Research Based On Novel Features

Posted on:2012-09-23Degree:MasterType:Thesis
Country:ChinaCandidate:H F HeFull Text:PDF
GTID:2230330395485621Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The rapid progress of human and a variety of model organism genome-sequencing projects not only indicate the post-genomics era, but also bring a lot of genetic data. Bioinformatics can provide theoretical support for the efficient processing of these data. Gene recognition is an important component of bioinformatics. A large number of algorithms have been applied to gene recognition. But there are still some problems have not been effectively solved, such as short eukayotic gene recognition. In this paper, we focused on short coding region recognition of human gene.Effective extraction of biological information affect the performance of gene recognition. In this study, we try to solve the issue using new methods. We received two new features by integrating the information of the distributions of stop codons and the information of base compositional bias. And the pseudo-base composition features, which can extract the information of the bases interaction in different positions, were given by transplanting pseudo-amino acid composition to the DNA sequence, The average accuracy achieved by three new features was as high as92.73%for the fragments with length of192base pairs. And a15-dimensional feature vector was proposed, which contains the features mentioned above. The accuracy of the algorithm with the feature vector can achieve95.65%in the length of192bp. We find that the use of the combination of two characters and pseudo-base composition features improve the accuracy of coding region recognition.The choice of classification methods affects the accuracy of the algorithm. The precise model must be selected to obtain higher accuracy in short gene recognition. A good choice is the radial basis function neural network. We gave a sample filtering mechanism to solve the problem that memory distortion and memory loss in neural network. The sample filtering mechanism was proposed based on heuristic information, which got from the combination of K-means clustering results and sample type tags. A number of results can be got from experiments in a single data set. Then we got a final result by using voting mechanism. A good accuracy can be obtained in short coding region recongition by doing the ways above.
Keywords/Search Tags:Gene recognition, Coding region recognition, Exaction of biologicalinformation, RBF neural network
PDF Full Text Request
Related items