| Influenza virus can cause influenza.With the multiple influenza outbreaks in recent decades,it has brought challenges to global human health and safety.Studying the interaction between influenza virus and human host protein can reveal the mechanism and principle of influenza virus infecting human disease from the molecular level,and provide guidance for the research and development of influenza virus immune drugs.Traditional biological experimental methods are time-consuming and labor-intensive in this issue,and the prediction results have limitations such as high false positives.With the development of computer technology,machine learning methods have been widely applied in the study of pathogenic microorganisms and have achieved good results.Therefore,this article applies machine learning methods to the interaction between influenza viruses and human host proteins.(1)In this paper,the feature engineering method based on amino acid composition,pseudo amino acid composition,and joint triad is selected to characterize protein sequences,and is applied to seven machine learning models,including decision tree,adaptive enhanced decision tree,gradient lifting decision tree,random forest,naive Bayes,logical regression,and support vector machine,Explore the performance of the combination of feature engineering methods and machine learning algorithms in this prediction problem.The experimental results show that random forest performs best on this issue.The combination model based on pseudo amino acid composition and random forest proposed in this paper can achieve 75.19% accuracy and 78.65% F1 score.In this paper,the feature ablation experiment shows that the pseudo amino acid composition feature is more important for random forest algorithm,and the joint triad feature is more important for support vector machine algorithm.At the same time,in the comparison experiment,the proposed method has better results than other methods in the area under the ROC curve,and the AUC value of the model based on the composition of pseudo amino acids and random forest reaches 0.849.This shows that the combination model based on pseudo amino acid composition and random forest proposed in this paper has a good prediction effect on the interaction between influenza virus and human host protein.(2)In this paper,we use a combination model based on pseudo amino acid composition and random forest to predict whether 9155 intraspecific human proteins interact with influenza viruses.GO enrichment analysis and KEGG enrichment analysis were performed on 2963 predicted human proteins that may undergo interactions.The analysis results indicate that these human proteins mainly appear in mitochondria and nuclei,and may participate in various viral biological processes such as endocytosis,proteomic enzyme production,and viral gene transcription.The above results indicate that the predicted human host proteins in this article are closely related to influenza virus infection in humans,and can provide reference for subsequent experimental validation and related target drug development. |