| Molecular evolution is a change in the sequence composition of cellular molecules over long periods of time. Genetic variants will arise through mutation and will spread and be maintained in populations due to genetic drift or natural selection. We focused on bactieria evolution and mutation analysis, and used evolutionary featrues to predict essential genes and genomic islands.In works of evolution and mutation analysis, we focused on two sides:point mutation and horizontal gene transfer (the third part). Analyses of point mutation were both based on theorical (the first part) and exprimental data (the second part).(1) Despite rapid progress in understanding the mechanisms that shape the evolution of proteins, the relative importance of various factors remain to be elucidated. We assessed the effects of 16 different biological features on the evolutionary rates of protein-coding sequences in bacterial genomes and found that there was no single factor in determining the evolution of bacterial proteins. Not only transcriptional abundance, but also protein-protein associations, essentiality, subcellular localization of cytoplasmic membrane, transmembrane helices and hydropathicity score independently and significantly affected evolutionary rates. In some species, protein-protein associations and essentiality demonstrate higher correlations with evolutionary rates than transcriptional abundance.We also provided evidences that use CAI to measure transcriptional abundance could overestimate the importance of expresssion level to evolutionary rates. Essential genes also show greater codon bias than average using four types of analyses. The argument could be from the nosie of expression data.In addition, we systematically investigated the effects of head-on conflict on protein evolution in eukaryotic genomes and bacteria. Replication forks could be arrested and transcription could be slowed when head-on collisions occurred, but not when co-directional collisions occurred. Collisions may give rise to deleterious effects. Conserved transcripts have more opportunities to retain if they are on leading strand during the process of evolution. Bacterial genes on leading strand are therefore more abundant, evolve slower, and are older, than those on lagging strand. However, only house-keeping genes are expressed at process of replication in eukaryote. Selection of eukaryotes is weak under head-on conflicts.(2) Mutation is the ultimate source of genetic variation and evolution. Above mentioned in last part. we had studies based on theorical mutation data. Mutation accumulation experiments are an alternative approach to study de novo mutation events directly. We had therefore constructed a resource of the Spontaneous Mutation Accumulation Lines (SMAL), which now contains all the current publicly available MA lines identified by high-throughput sequencing, for studying mutation. We had relocated and mapped the mutations based on the most recent annotations. A total of 5608 single base mutations and 540 other mutations were obtained and are recorded in the current version of the SMAL database. The integrated data in SMAL provide further more accurate information that can be used in new theoretical analyses. We believe the SMAL resource will help researchers better understand the processes of genetic variation and the incidence of disease.(3) Horizontal gene transfer is another of the most important ways for microorganisms to mutate and evolve. Until now, few reports have provided evidence for the co-evolution of horizontally transferred genes and their hosts. We obtained 17 groups of homologous genomic islands. Using phylogenetic analyses, we found that the topological structure of a distance tree based on the proteins of each group of homologous genomic islands was consistent with that based on the complete proteomes of the hosts. This result clearly indicates that genomic islands and their bacterial hosts have co-evolved. It despites the commonly accepted theory that the phylogenies of horizontally transferred sequences and host organisms should be inconsistent.Based on analysis results, we devoloped a new tool to predict essential genes (the fourth part) and imporved genomic island predicted method (the fifth part).(4) Integrative genomics predictors, which score highly in predicting bacterial essential genes, would be unfeasible in most species because the data sources are limited. As mentioned above, we comfirmed that gene essentiality is related to evolutionary rate.We therefore developed a universal approach and tool designated Geptop, based on orthology and phylogeny, to offer gene essentiality annotations. In a series of tests, our Geptop method yielded higher area under curve scores in the receiver operating curves than the integrative approaches. Geptop can be applied to any bacterial species whose genome has been sequenced. We developed a new database that integrates quantitative fitness information for microbial genes. The database currently covered 16 experiments and recorded 2186 theoretical predictions based on Geptop.(5) Based on evolutionary features of genomic islands. We predicted genomic islands in three bacterial pathogens of pneumonia. After implementing the cumulative GC profile combined with h and BCN index, eight genomic islands are found in three pathogens. The present results show that this method is good for predicting genomic islands in bacteria and it has lower false positive rate than the SIGI method. We also adopted the method of Z-Island to identify genomic islands in seven human pathogens. Thirty-one genomic islands were found using this method. The Z-Island method was found to detect more known genomic islands than SIGI-HMM and IslandPick. Furthermore, it maintained a better balance between specificity and sensitivity.In summary, we studied evolution and mutation of bacteria, and applied the results in predicting related genes. There are still many issues worthy of further study. |