Font Size: a A A

Development And Application Of Automated Methods For Identifying Crucial Mutations In The Evolution Of RNA Viruses Such As SARS-CoV-2

Posted on:2023-11-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:C Y JiFull Text:PDF
GTID:1520306911467634Subject:Biophysics
Abstract/Summary:PDF Full Text Request
The outbreak of COVID-19(Coronavirus Disease)caused by SARS-CoV-2(Severe acute respiratory syndrome coronavirus 2)is turning into a global concern and the virus continues to mutation,bringing heavy medical burden.SARS-CoV-2 as a novel RNA virus has a high mutation rate and adaptability,which imposes a threatening challenge for the public health.Therefore,the analysis of its mutation and evolution can assist us monitor the virus,which may provide evidence for better prevention and control policy or drug and vaccine design.The exponential growth of viral genomics data,due to the recent advance of sequencing technology,allows deeper analysis of viral mutation and evolution.However,the unprecedented amount of data is challenging for traditional phylogenetic methods designed for medium scale analysis.Moreover,identifying beneficial mutations for viral fitness from a pool of mutations that are mostly neutral is difficult.Therefore,for fast filtering of potentially adaptive mutations from huge genomics data,the development of new method and tools,based on classic evolution theory,is crucial for variant identification and monitoring viral evolution.Since the outbreak of COVID-19,we discovered that fixed and parallel mutations are linked to the adaptation of virus after research on SARS-CoV-2.At the early stage of the outbreak,the first 129 genome sequences of the virus had two fixed mutations at sites 8782 and 28144(8517 and 27641 sites of the joint Coding Sequences).The consistent nucleotide substation of the two sites was used to cluster the early strains into G1 and G2 genotypes.The two fixed mutations and the according genotyping indicate the genetic diversity of SARS-CoV-2 and the necessity of mutation monitoring of the virus.As the outbreak of SARS-CoV-2 turned into a pandemic,we constructed the mutation profile of the virus and identify potentially adaptive mutations that occurred after one year of the outbreak.A total of 130 mutations were found in the sequences of 3,823 viral genomes representative of 355,067 genome records uploaded to the database after one year.Of the 130 mutations,there are 75 fixed non-synonymous mutations.The 24 potentially adaptive mutations were,identified for being fixed and parallel on the phylogenetic tree.For example,the D614G and N501Y mutations on the Spike protein were confirmed to increase viral fitness by further experiment.The work suggests fixed and parallel mutations can be used to identify potentially adaptive mutations.Based on the understanding of viral evolution from previous research,we aimed at developing a method to identify fixed and parallel mutations using viral sequence and corresponding phylogenetic tree,in order to provide evidence for recognition of potentially adaptive viral mutation.By implementing algorithms of resolving phylogenetic lineages,minimizing entropy to infer fixed and parallel mutation,an R package called sitePath was developed(https://wuaipinglab.github.io/sitePath/).The test result shows a high accuracy of identifying fixed mutations.The application of sitePath on SARS-CoV-2 reveals 37 mutations that are both fixed and parallel.Most mutations used to define VOC(variant of concern)by WHO(World Health Organization)are found among the 37 mutations,and 26 of 37 mutations proved to be linked to viral fitness by experiment.Apart from identifying existing individual potentially adaptive mutations,we also attempted at predication for future adaptive mutation pairs.We used matrix decomposition to model synergic effect between adaptive mutations from viral strains.But the test result indicates the model is unable to distinguish the mutation pairs that do and do not appear in the future.The main reason might be because of the epistatic effect among mutations,disrupting the existing synergic effect for extrapolating future mutation pairs.To summarize,this thesis uses fixed mutation to genotype early SARS-CoV-2 strains,identifies fixed and parallel mutations after continuous evolution of the virus and provides evidence for monitoring potentially adaptive mutations of SARS-CoV-2.Furthermore,we developed sitePath for identifying fixed and parallel mutations and the applied on SARSCoV-2.The result shows the ability of the method in inferring potentially adaptive mutations for the novel evolving virus.
Keywords/Search Tags:SARS-CoV-2, Point mutation, Viral evolution, Molecular phylogenetic tree, Adaptive mutation identification
PDF Full Text Request
Related items