Prediction Of Biological Characteristics Of Influenza Virus Based On Machine Learning

Posted on:2024-09-28

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Fan

Full Text:PDF

GTID:2530307067472124

Subject:Cyberspace security

Abstract/Summary:

PDF Full Text Request

The influenza virus genome consists of eight genetic segments of varying lengths,with a total length of approximately 13 kb.Due to the special molecular synthesis mechanism of viral polymerase,viral genes are prone to point mutations,which furtherlead to rapid mutation of the virus through the genomic rearrangement mechanism,triggering changes in biological properties and threatening human health.Currently,two biological properties closely related to public health deserve our attention: 1)the risk of spillover of naturally occurring avian influenza viruses and infection of humans,and 2)pathogenicity of interpersonal transmission of influenza B viruses.Biogenetic or protein data,which can be viewed as strings composed of specific sets of characters,allow us to draw on machine learning methods to model and predict the biological properties of infectious diseases,serving the purpose of early surveillance and prevention.The focus of this study is to establish predictive models for two scientific questions:the risk of spillover of avian influenza viruses and the pathogenicity of influenza B viruses.Specifically,the study aims to: 1)construct a deep learning-based predictive model for the risk of spillover of avian influenza viruses.Genomic data of avian influenza viruses are selected,and the data set is divided into different clades based on phylogenetic relationships.Convolutional neural networks(CNNs)and recurrent neural networks(RNNs)are combined to represent the genomic sequences,and the models are trained and tested on specific clade data sets and the entire data set,respectively.Experimental results show that the specific clade models perform well in predicting the data sets of their respective clades,with AUROC(area under the receiver operating characteristic curve)values and AUPR(area under the precision-recall curve)values exceeding 0.966 and 0.876,respectively,but with limited generalization ability.The global model achieves AUROC and AUPR values of 1.000 for all clades except H9N2.Through ablation experiments,it is found that attention mechanisms and sequence embedding methods have a significant impact on model performance.Further testing of model generalization ability shows that transfer model AUROC and AUPR values are above 0.984 and 0.941,respectively.Finally,attention weight matrices are visualized to provide interpretability for the model.2)Propose an integrated learningbased model for predicting the pathogenicity of influenza B virus.A dataset of protein sequences of type B influenza virus was constructed,and 40 critical amino acid positions were selected using entropy-based ranking.Two types of information features were generated using the random forest method,and the optimal feature subset was selected using the Minimum Redundancy Maximum Relevance(m RMR)algorithm.Based on the sequential forward search algorithm,the class information feature dimension was optimized to four dimensions,with an accuracy(ACC)value of 94.2% and a Matthews correlation coefficient(MCC)value of 88.4%.The probability information feature dimension was optimized to three dimensions,with an ACC value of 94.1% and an MCC value of 88.2%.The optimal feature subset was superior to individual original features.Furthermore,the performance differences between the sequential forward search algorithm and two common ensemble learning methods were compared,and the optimal subset obtained by the sequential forward search algorithm showed relatively good performance.

Keywords/Search Tags:

Influenza Virus, Machine Learning, Spillover Risk, Pathogenicity

PDF Full Text Request

Related items

1	Pathogenicity And Transmissibility Of Two Epidemic Strains Of Influenza Virus Isolated From Hebei Province
2	Pathogenicity And Cross-species Transmission Of Influenza A (H7N9)Virus Isolated From Environment
3	Molecular Characteristics And Pathogenicity Of H6 Subtype Avian Influenza Virus In Guangdong Province From 2014 To 2017
4	Investigation Of H5N6 Influenza A Viruses In Pig Farms And The Pathogenicity Analysis Of H1N1 Recombinant Strain In Mices
5	The Study Of HA And NA Genetic Characteristics And Pathogenicity In Mice Of Influenza B Virus In Guangxi During 2018～2021
6	Mammalian Pathogenicity Study Of An H9N2 Influenza A Virus Isolated From A Severely Ill Nine-Year-Old Patient
7	Prediction Of Biological Characteristics Of Influenza Virus Based On Machine Learning
8	Machine Learning Prediction And Biological Verification Of Host Adaptation Of NP Gene Of Influenza Virus
9	Phylogenetic Analysis And Pathogenicity To Mouse Of H6 Subtype Avian Influenza Virus Isolated From 2014 To 2016 In Partial Area Of China
10	Effects Of PA-X And PB1-F2 On The Pathogenicity Of H1N1 Swine Influenza Virus