Font Size: a A A

Research And Application On Virus-Host Association Prediction Method Based On Deep Learning

Posted on:2024-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:Z TianFull Text:PDF
GTID:2530306938451654Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Viruses,as a class of microbe infecting host cell,rely on host for reproduction.They not only pose a threat to human health but also have severe implications for numerous agricultural and environmental systems.Consequently,studying the relationship between viruses and their hosts holds significant practical value in virology,microbiology,and medical fields.Historically,research has primarily employed biological experimental methods to predict virus-host interaction,such as double-layer plate method,genomic affinity chromatography and sequencing.However,these traditional experimental approaches tend to be time-consuming,labor-intensive,and costly,prompting researchers to focus on computational methods for virus-host interaction prediction as a hot topic.This study aims to explore the feasibility and effectiveness of using deep learning techniques to predict virus-host interaction,further enhancing the accuracy and speed of such predictions and providing robust support for in-depth research on virus-host interaction.Firstly,this thesis amassed 13,190 viral genomes and 367,271 host genomes from four prevalent bioinformatics databases,pinpointing 4,834 virus-host interactions across 69 bacterial genera,take these data as positive samples.For the case of insufficient negative samples,this study apply two techniques to select negative samples: sequence similarity-based approach and clustering-based approach.Ultimately,this thesis obtains two different types of virus-host interaction negative samples.Secondly,during the feature extraction phase,a k-mer method that accounted for sequence errors was utilized.Subsequently,two deep learning-based virus-host prediction models tailored to distinct application scenarios were developed,taking into account the variability in sample types.The experimental results revealed enhanced model performance when employing a k-mer feature extraction method with K=5,L=3(gap distances of 0,1,and 2),Levenshtein sequence similarity distance calculation,and a 1:5 positive-to-negative sample ratio.Furthermore,these two prediction models have specific application scenarios for different types of bacteria.Finally,Phage HP,a virus-host prediction software compatible with Windows operating systems and web applications,was created,enabling users to efficiently process sequence data and predict virus-host relationships.During testing on laboratory data,the methodologies presented in this thesis achieved an accuracy of 95%,outperforming other predictive tools.
Keywords/Search Tags:Virus-host association, K-mer, Negative sample selection, Deep learning
PDF Full Text Request
Related items