| Every year, the tuberculosis does serious harm to the health of people throughout the world. So far the genome research about mycobacterium tuberculosis, which is the causative agent of most cases of tuberculosis, has made great progress. And there is the whole genome annotation information for mycobacterium tuberculosis in the public genome database. However, the genome annotation would be outdated with the functional information when several years passed since the sequencing. Latest database may contain new functional genes which were not yet assigned functional information when the analyzed genome was sequenced. Those newly added functional information would provide the source of function transfer for some hypo thetical genes in the analyzed genome. Sometimes, a few genes missed by the original annotation may be found with the similarity alignment.In the paper, we will focus on the problems above, and conduct genome re-annotation for the genome of mycobacterium tuberculosis through the similarity alignment method and the ab initio program to find new genes. All the genome re-annotation would base on the latest genome database. In addition, o ur re-annotating procedure may provide a reference for improving annotation of the other bacterial genomes. The main content of this study are as follows.1. Based on the Z-curve representation of the DN A sequences, two types of DNA sequences should be generated as the training sets. The first set is the function-known genes that have definite names and it serves as the positive set of training model. On the other hand, the randomly shuffling sequences of the positive ones become the negative samples. With the Fisher model(5- fold cross-validation), then the non-coding proteins from the hypothetical protein collection could be eliminated based on the training results of positive and negative samples.2. In this work, two programs, Prodigal and ZCURVE, have been used to find candidate new genes with the lower overlap rate against the annotated genes. Then the candidate new genes would be submitted to the blast to find all the homologous genes. At last, the genuine genes which have been filtered by the parameters would be assigned functional information.3. In the process of genome re-annotation, it usually needs researchers to manually check the original annotation. Sometimes, however, it will be a very hard work when there are amount of DNA sequences needed to conduct re-annotation, especially filtering the blast result. Therefore, one web tool mainly coded by PHP would be designed for automatically genome re-annotating, which is able to reduce the manual work and increase the efficiency of genome re-annotation. |