Genotyping Of Influenza A Virus And Prediction Of Human-host Protein Interactions Based On Machine Learning

Posted on:2023-09-05

Degree:Master

Type:Thesis

Country:China

Candidate:L Wang

Full Text:PDF

GTID:2530306842968779

Subject:Agricultural Information Engineering

Abstract/Summary:

PDF Full Text Request

In the past two decades,influenza,especially avian influenza,has had a significant impact on the poultry industry,animal husbandry and other agricultural fields,resulting in huge economic losses,at the same time,it is easy to have serious adverse effects on human health.The influenza virus is the root cause of influenza,and the research on the influenza virus is still high up to now.The more representative ones are the genotyping of influenza virus and the identification of protein interaction pairs involved in infecting the human.Traditional biological experimental methods are time-consuming and labor-intensive to solve these two research problems,the accuracy needs to be improved and the versatility is not strong.With the development of information technology,machine learning has been applied in more and more fields,including the application research in the field of pathogenic microorganisms.In this paper,machine learning is applied to the above two problems of influenza A virus as follows:(1)For the genotyping of influenza A virus,many studies have focused on the model,and we mainly explored the impact of different features in the machine learning model on the results when predicting the gene type.Compared with the protein features that are widely used in existing research feature extraction methods,this study uses different nucleic acid sequence-based dinucleotide features,and selects protein sequence-based word vector features,which are applied to four machine learning classification models of DT,KNN,NB and SVM,it is finally shown that the Prot Vec method can obtain better results,in the prediction of viral hemagglutinin genotyping(H type),the accuracy can reach 100%,and the accuracy can also achieved of 99.95% in the classification prediction of neuraminidase genotype(N-type).The results show that the method proposed in this study can effectively predict the genotype of influenza A virus.(2)For the prediction of the interaction between influenza A virus and human proteins,the method of word vector based on protein sequence as a feature is continued,and its performance in the interaction prediction problem is discussed.In the research of this section,the dataset of positive and negative samples is the first constructed.Because the positive and negative samples are unbalanced,we use three datasets of positive and negative 1:5,1:8,1:10 for training,and also apply them to the four classification models:DT,KNN,NB and SVM.The final experimental results show that the accuracy rate of the Prot Vec method in the 1:10 dataset is 90.09%,the F1-score is 90.09%,and the accuracy rate obtained in the 1:8 dataset is 89.13%,the F1-score is 88.89%,the accuracy rate obtained in the 1:5 dataset is 83.33%,and the F1-score is 83.33%.The results show that the method proposed in this study can effectively predict the interaction between influenza A virus and human proteins.

Keywords/Search Tags:

genotyping, protein interactions, machine learning, feature engineering, dinucleotides, Prot Vec

PDF Full Text Request

Related items

1	The Prediction Of Protein Interactions Based On Integrated Learning Model
2	Predicting Protein-protein Interactions From Protein Sequence Based On Multiple Feature Extractions
3	The Research Of Predicting Hot Spots At Protein-Protein Interface Based On ELM
4	Predicting Protein Protein Interactions And Its Active Sites Based On Data Mining Algorithm
5	Prediction Of Hot Spots And Feature Analysis Of Hot Regions At Protein-DNA Binding Interfaces
6	Research On Predicting Protein-protein Interactions Based On Machine Learning
7	Predicting Protein-protein Interactions Based On Machine Learning Algorithms Using Logistic Regression Model To Improve Accuracy Of Peptide Identification In Mass Spectrometry Analysis
8	Prediction Of RNA-protein Interactions Based On Machine Learning
9	Research On Predicting Protein-protein Interactions Based On Relevance Vector Machine
10	Feature Engineering Design And Interpretation Of ECG Signals