Font Size: a A A

Predicting Interspecies Transmission And Antigenic Relationship Of Influenza A Viruses Based On Machine Learning Methods

Posted on:2013-12-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:J WangFull Text:PDF
GTID:1224330392955569Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Avian influenza virus is a class of avian-adapted influenza A viruses. During the pastdecade, avian influenza virus took many people’s lives and brought big panic and closeattention to human society. Influenza H3N2virus is another class of influenza A viruseswith significant impact on public health. Their antigenic variants result in reduced or evenlost effectiveness of the current vaccine, causing trouble in the work of global influenzasurveillance. The research about interspecies transmission and antigenic variants of the twokinds of influenza A viruses is of great importantance both in theoretical and practicalaspects. Based on machine learning, information theory and feature selection methods, theprediction models of avain-to-human transmission of avian influenza viruses and antigenicrelationship of influenza H3N2viruses are improved. Meanwhile,90signature amino acidpositions for avain-to-human transmission of avian influenza viruses and18critical aminoacid positions for antigenic variants of influenza H3N2viruses are identified. This studythereby can provide early warning for public health and valuable clues for the relatedresearch about molecule determinants and underlying mechanism.First, due to the fact that there are no experimentally confirmed avian influenza viruseswhich can not directly infect human to be considered as negative samples and one-classsupport vector machine is an approach successfully applied in solving problems where thenegative class is not well defined, thus we explored the feasibility of using one-classsupport vector machine to predict avian-to-human transmissions of avian influenza viruses.The final prediction model constructed with amino acid composition, dipeptidecomposition and autocorrelation achieves good performance. The prediction accuracy ishigher than that of the previous prediction model of back propagation neural network.Secondly, when we established the negative testing dataset in the last study, it wasfound that our negative data are more reliable than the negative data used in the previouspredicton model. Therefore, we increased the number of two kinds of samples andattempted to construct traditional binary-class model to improve the prediction ofavian-to-human transmissions of avian influenza viruses. The90signature positions were selected with entropy method. Based on four feature selection methods including Relief,mRMR, information gain and genetic algorithm, the optimal physicochemical featuresubset was mined. The performance of the final precidtion model constructed with theoptimal feature subset achieves great improvement than that of the other existing predictionmodels. The experimental results of cross-validation and an independent test show that thefinal features and the model is efficient to predict the transmission of avian influenzaviruses from avian to human.Thirdly,394antigenic relationship data of H3N2influenza virus were collected fromrelated publications. Then, different scoring methods including phi coefficient, odds ratioand mutual information were compared. Base on multiple linear regression model and thebetter scoring method (i.e. phi coefficient),18amino acid positions were identified to becritical for antigenic variants of H3N2influenza virus. All the18critical positions arelocated in five epitopes of HA protein. Additionally,8positions are identical to theidentified positive selection positions in other studies. The results indicate that the18position play important roles in antigenic variants of H3N2influenza virus.Finally, based on the aforementioned work, we tried to improve the prediction modelof antigenic relationship of H3N2influenza virus and reduce the false positive. Based onthe hint that physicochemical property change would be more effective for antigenicvariants of H3N2influenza virus, using the physicochemical feature candidates selected bymutual information and hierarchical clustering, the final prediction model was constructedwith stepwise multiple linear regression. The experimental results on training and testingdatasets indicate that our prediction model surpass the exsiting precition models includingthe hamming distance model, the group scoring model and the decision tree model.Furthermore, we developed a web tool named as H3N2-AR to provide the online service ofpredicting antigenic relationship of H3N2influenza virus for the researchers in this field.
Keywords/Search Tags:machine learning, support vector machine, multiple linear regression, featureselection, information theory, influenza A virus, interspecies transmission, antigenic relationship
PDF Full Text Request
Related items