Font Size: a A A

Statistical Models Of Multiple Warning For Outbreak Of Influenza Pandemic

Posted on:2015-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2250330425474425Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
As the major branch of mathematical statistics, time series analysis adheres to the basic principles of mathematical statistics, which uses the observation information to estimate the nature of the overall. In1990, Jeffrey proposed a visual method for DNA sequences-Chaos Game Representation. But this method has been applied to the research of protein sequences. In2004, Yu zu guo et al proposed the CGR of protein sequences based on the detailed HP model to the intrinsic characteristics of the protein sequences. In order to establish statistical models of multiple warning for outbreak of influenza pandemic, studying protein sequence is of great significance.This paper continues to mainly study the protein sequences of the influenza virus. This paper background is introduced in the exordium, then HA/NA protein sequences of the influenza virus from1914to2012are chosen and using of statistical methods obtains characteristics of influenza virus variation, which is important significance for the prevention of influenza virus. Details are as follows:In Chapter one, the related knowledge of bioinformatics are briefly outlined. Besides, the meaning of protein sequences is introduced.In Chapter two, Based on the detailed HP model, HA protein sequences of the influenza virus are converted into the CGR time series, and the long memory ARFIMA(p,d,q) model is introduced to simulate this kind of sequences. Then The9H5N1sequences are selected randomly to analysis to uncover that ARFIMA(1,d,1) models are used to identify the sequences.In Chapter three, based on CGR-walk model and the integer-order difference, using long-memory model is prediction for the influenza virus A/HA protein sequence. The71selected protein sequences with high homology from1943to2013are made CGR walk and then used ARFIMA(p,d,q) model to fit and predict its former10positions. This model is more reasonable, and prediction of the effect is very good.In Chapter four, the HA/NA protein sequences of the influenza virus from1914to2012are studied. What’s more, the variances, to lag2autocorrelation coefficients and the predictable percentage of HA/NA protein sequences in chaos game representation of protein sequences based on the detailed HP model are calculated. It found that the variances, to lag2autocorrelation coefficients and the predictable percentage of HA/NA protein sequences for the pandemic years are significantly higher than those for the previous adjacent years, while those in the non-pandemic years are usually smaller. So it’s concluded that characteristics of influenza virus variation for outbreak of influenza pandemic.In Chapter five, it’s the summary and expectation of this dissertation.
Keywords/Search Tags:influenza A virus, protein sequence, time series model, ARFIMA (p,d,q) model, prediction, CGR-walk model, early-warning signals, predictable percentage
PDF Full Text Request
Related items