Font Size: a A A

Research On Sample Classification Algorithm And Marker Discovery Based On Amplicon Sequencing Data

Posted on:2021-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z X WeiFull Text:PDF
GTID:2404330605461396Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,intestinal microorganism is a research hotspot,which is closely related to human health and disease.It is an important direction for human beings to search for disease-related microbial markers.With the development of high-throughput sequencing,more and more microorganisms that cannot be cultured in the laboratory can be found by sequencing technology,and the data of intestinal microbiome is also growing explosively.It is urgent to study or introduce new machine learning algorithms to find the microbial markers of related diseases.In this paper,the machine learning algorithm of disease classification based on intestinal microbial data was studied.On this basis,the feature extraction methods for microbial marker discovery were studied,and a microbial marker database was implemented.This paper mainly includes the following works:First,the LightGBM algorithm was introduced to classify the diseases of intestinal microorganisms.The relationship between disease and intestinal microbes can be regarded as a supervised classification problem.In this study,LightGBM,a new algorithm,was introduced into the problem of disease classification based on microbiome,and compared with several commonly used methods in microbiological data,such as deep forest,random forest and support vector machine,and applied to the intestinal microbial data of ten diseases.In the experiment,microbial abundance data was used as the input of the algorithm,intestinal microorganisms were used as the characteristics,and whether the sample is sick or not was used as the input label.The experiment systematically evaluated the classification effect of four algorithms in different diseases and different datasets of the same diseases.Using 5-fold cross-validation,it was found that LightGBM has the best performance on multiple datasets.Second,the feature selection algorithm evaluation and database construction of microbial marker discovery.Firstly,on 27 datasets of 10 diseases,the classification effect of SVM classifier before and after feature selection was compared and analyzed.Secondly,the coincidence of biomarkers screened by seven representative feature selection methods was evaluated on three diseases.Through feature selection,disease-related biomarkers can be screened out.In addition,on the datasets of diseases related to intestinal microorganisms,the mRMR and ReliefF algorithm had a good effect.Finally,based on the experimental results,a database of diseases and intestinal microbes was built,and the service of querying related disease markers was provided to users on the web page.To sum up,LightGBM algorithm was introduced in this paper for the classification of diseases of intestinal microorganisms to find the association between diseases and intestinal microorganisms,and seven representative feature selection methods were evaluated to find relevant biomarkers,and a database of diseases and intestinal microorganisms was constructed.
Keywords/Search Tags:Classification prediction, Feature selection, Intestinal microorganisms, Biomarker, Database
PDF Full Text Request
Related items