Font Size: a A A

Study Of A Influenza Virus Antigenicity Prediction Based On The Hemagglutinin Protein Sequence

Posted on:2018-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:X H LiFull Text:PDF
GTID:2334330512971582Subject:Biology
Abstract/Summary:PDF Full Text Request
Timely identification of emerging antigenic variants is critical to influenza vaccine design,flu surveillance,and human health.Traditional experimental methods(such as hemagglutination inhibition test),although the predictive effect is good,but there are many shortcoming:(1)it is time-consuming and laborious,far from being able to meet the dramatic increase in strains caused by the virus mutation;limited by the economic,material and other objective conditions,some experiments cannot be carried out smoothly and lead to the serological data our access are relatively sparse(with a large number of missing values);In addition,due to the data from different laboratories,there are artificial and systematic error,so the final serological data are with a lot of low titration.In order to accelerate the prediction of the antigenic variation of influenza virus and to improve the quality of prediction,bioinformatics method based on influenza virus protein sequences and HI-table data has been proposed.In this paper,the information of influenza A virus hemagglutinin(HA)protein were extracted and combination with the serological data for the influenza A virus antigenic variants prediction.The main work is represented as follows:1.We reviewed the recent advances in the prediction of antigenic variability of influenza A virus at home and abroad,which is mainly focused on the feature extraction from hemagglutinin protein sequences and various prediction algorithm.Commonly used features that are: binary representation,according to the nature of amino acid extraction and other properties.The prediction classification algorithms are matrix completion,K nearest neighbor,support vector machine,logistic regression and lasso algorithm.2.A novel sequence based algorithm named joint random forest regression(JRFR)algorithm was proposed to directly predict the antigenic distance of influenza A virus.We combined 94 amino acid substitution matrices and HA1 to predict the antigenic distance of influenza A virus.Our algorithm not only improves the prediction accuracy,but also has good predictive effect on the antigenic variation of new virus.3.A new matrix completion algorithm(BMCSI)based on hemagglutinin protein sequence was proposed for filling and correcting hemagglutinin inhibition test data,which was too sparse and contains many instable values,so as to get a more accurate calculation the antigenic distances of the viruses.And the antigenic distances between the viruses were mapped to the two-dimensional space by the MDS algorithm to obtain the visualization of the antigenic distance of the influenza A virus.Our method has a 37% better prediction accuracy than the previous study(RMSE = 0.6586)on 68-03 data.
Keywords/Search Tags:Antigenic variation, hemagglutinin protein sequence, matrix completion, random forest, substitution matrix
PDF Full Text Request
Related items