Font Size: a A A

Genomic Data-Based Research On Dynamic Adaptive Evolution Of Egg Cultivated H3N2 Influenza Virus

Posted on:2020-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:J Q WangFull Text:PDF
GTID:2370330572488224Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Influenza caused by various influenza virus has gradually been a major health threat worldwide,and among the most severe ones is H3N2 influenza virus that can cause pneumonia,respiratory tract failure and even death,bringing increasing health challenge to human society.Because its various adaptive evolution via different culturing media will result in the reduction of vaccine efficacy,especially that in egg media,the main purpose pf this paper is combining computational biology and machine learning to explore the dynamic adaptive evolution of H3N2 influenza cultivated in egg media.Based on the 69362 sifted gene sequence samples from GISAID database,I firstly build the gene phylogenetic tree of these samples by GTR(General Time Reversible)model and Maximum Likelihood method,then using Mutation Mapping method to simulate the mutation history on the terminal branches of this tree.After these preliminary procedures,I find 18 codon sites which are subject to intensive adaptive evolution in egg media through Enrichment Test and Convergent Test.Having noticed that many possible 'egg cultivated samples are misclassified into Unknown tag,if we ignore these part of egg"samples,we may miss some relevant information about adaptive evolution mode that is harmful for promoting the vaccine efficacy.Meanwhile the egg proportion is very small(only 1.2%),so for a better research on this imbalanced data problem,we use XGboost and Adaboost respectively to learn and predict the media tags in two different datasets—virus samples submitted before 2016 and after 2016,finding that the prediction of the latter one is much more accurate than the former one,therefore we can further substantiate the fact that there is an increasingly intensive adaptive among virus cultivated in the egg media.Considering that the more intensive adaptive evolution among the virus cultivated in egg media is,the more reduction of vaccine efficacy will be,I firstly explore the potential different clusters representing different adaptive evolution modes among all egg cultivated virus.To better understand and excavate these dynamic modes according to different years,I apply an“Gibbs sampling based”association analysis to this task,bringing an entirely new clustering idea to uncover the typical codon sites and their codons corresponding to different clusters,which is clear and logical in biological meaning.The clustering result can provide valuable information to further improve the vaccine efficacy.
Keywords/Search Tags:H3N2 Influenza Virus, Egg Media Cultivation, Dynamic Adaptive Evolution, Computational Biology, Machine Learning
PDF Full Text Request
Related items