Font Size: a A A

Robust Classification Of Bacterial And Viral Infections Via Host Defensin Gene Expression Profiles

Posted on:2019-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:S Y ZhouFull Text:PDF
GTID:2404330572953298Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Infection is the invasion of body tissues of an organism by disease-causing agents,their multiplication,and the reaction of host tissues to the infectious agents and the toxins they produce.Different treatment strategies can be used to different types of pathogen infection,such as antibiotic treatment for bacterial infection,antifungal drug treatment for fungal infection.Therefore,the accurate identification of pathogen infection is of great significance to the clinical treatment of infectious diseases.Although traditional pathogen detection methods based on morphology,immunology and molecular biology have been widely used in the identification of clinical infection types,in recent years,it has been a new development direction to distinguish the infection types based on the host gene expression profile and using machine learning methods to identify the biomarkers of different types of pathogens.This kind of indirect diagnosis method based on bioinformatics can provide auxiliary and supplement information for the traditional direct detection methods.Defensin are endogenous antimicrobial peptides that help host defense bacterial,fungal and viral infections.Defensins play important roles in protecting against bacterial or virus infection in innate immunity.Therefore,it is possible to distinguish between virus infection and bacterial infection by analyzing the expression of the defensin genes and their related genes in the process of infection.In this study,we selected four data sets from GEO database which contain at least one type of infection samples and can be extracted the expression values of defensin.The data set with best quality can be the training dataset and the remaining three datasets are validation datasets.Data preprocessing operations such as data normalization,missing value processing and data integration were performed on each dataset.Then we estimated the expression values of the genes according to the expression values of the probes.We derived 49 bacterial or virus-special genes by analyzing gene expression level on the training dataset.We define five kinds of classification functions by using five machine learning methods such as k-Nearest Neighbor algorithm.Bayesian classification,support vector machine,decision tree and random forest,and then adjust the parameters of the classification function according to the result of leaving one cross-validation to get the optimal classification model.Our study evaluated the classification performance of 49 genes using different classification methods in four datasets.The results showed that 49 genes could distinguish bacterial and virus infection well,and the random forest method showed the best performance.We also compared the biomarkers found in other reports and our 49-gene set on our four datasets.We found that our 49-gene set performance better than some of the reports,and was equivalent to other reports.We shrunk the 49-gene set into a 10-gene set by calculating the contribution of each gene and found it also can classify the infection types well.Accurate identification of bacterial and viral infections has important clinical value and can help clinicians choose appropriate treatment methods.Therefore,it is of great significance to find biomarkers that can accurately identify bacterial or viral infections.Our study attempts to establish a new machine learning method to identify pathogen infection types based on the expression profiles of defensin genes and related genes,which can provide useful supplement to traditional clinical pathogen detection techniques.Further,our study also confirmed that different defensins may play an important role in the process of bacterial or viral infection,which provides clues for the application of certain defensin genes as biomarker to discriminate bacterial or viral infection.Although our study is currently limited to the use of gene chip based expression profiling data,the basic strategy of this study and the related machine learning model can be applied to the new generation of sequencing technology generated by the expression spectrum data,for future potential clinical applications.
Keywords/Search Tags:machine learning, infection type, host defensin gene, classifier
PDF Full Text Request
Related items