| Gut microbiota,a lifelong symbiotic organism with humans,is often referred to as the "second genome" due to its close relationship,diverse species and large population.Previous research has demonstrated varying degrees of association between the gut microbiota and numerous diseases.Furthermore,studies have revealed that the gut microbiota possesses unique "fingerprint features," and it can be used to diagnose and identify individual-specific information such as health condition,gender and age.However,it is regrettable that current domestic and international research struggled with issues such as limited sample sources,simplified host backgrounds,and inadequate analysis methods.The majority of studies focus on the diagnosis and identification of single impact factor cases,while research on the diagnosis and identification of complex mixed microbial communities with diverse sample sources,various diseases,and significant individual variations is relatively limited.In light of these circumstances,this study selected four diseases,namely drug use disorder,type 2 diabetes,Parkinson’s syndrome,and cholecystitis,as research targets.The study collected gut microbiota samples from 1,072 volunteers in four different regions(Shanghai,Yunnan,Gansu,and Shandong)with variations in gender,age,height,weight,and other factors.Based on 16 S r RNA gene sequencing,a total of 16 machine learning algorithms and20 regression classification methods were employed.Firstly,individual disease diagnosis models were constructed.Subsequently,by combining the aforementioned microbiota sequencing data,an individual identification model was established for the complex backgrounds of diseases,regions,genders,ages,weights,heights,body mass indexes,nicotine and alcohol preferrance across eight categories of single features.This research provides a theoretical foundation and practical exploration for the application of gut microbiota in disease diagnosis and individual identification.The main results were listed as follows:1.19 potential bacterial markers were selected from the drug use disorder group.Among them,indoles and short-chain fatty acid metabolism-related bacteria showed significant changes,which were likely to cause intestinal inflammation,irritable bowel syndrome,depression,etc.For example,the up-regulation of Prevotella_9 was likely to exacerbate intestinal inflammation and lead to schizophrenia.The up-regulation of Alistipes is likely to affect the level of tryptophan,a precursor of serotonin,and lead to depression.Down-regulated abundance of Faecalibacterium can cause inflammatory bowel disease.The downregulation of Dorea abundance compromises the integrity of the intestinal mucosal barrier.2.22 potential bacterial markers were selected from the type 2 diabetes group.The abundances of butyric acid metabolizing bacteria(Bacteroides,Faecalibacterium,Roseburia)and propionic acid metabolizing bacteria(Prevotella,Megamonas,Coprococcus)decreased significantly,suggesting that short-chain fatty acid-related bacteria is the key factor of type 2 diabetes mellitus.Notably,Escherichia Shigella is a bacterial pathogen associated with bacterial dysentery,which aligns with clinical symptoms such as persistent diarrhea in diabetes patients.In addition,the levels of gluconeogenesis and glycolysis were significantly increased,suggesting that the host hyperglycemic environment is related to the active glucose metabolism of intestinal flora.3.31 potential bacterial markers were selected from the Parkinson’s syndrome group.The abundance of short-chain fatty acid producing bacteria and anti-inflammatory related bacteria decreased significantly,suggesting that intestinal flora is one of the causes of gastrointestinal symptoms such as enteritis and malnutrition in Parkinson’s syndrome patients.Contrary to the previous results from Japan,the abundance of aging marker Eubacterium was significantly increased in our Chinese samples.In addition,the level of gluconeogenesis was significantly up-regulated,suggesting that diabetes is a potential risk factor for Parkinson’s syndrome.4.33 potential bacterial markers were selected from the cholecystitis group,among them Escherichia-Shigella abundance ratio as high as 32.59%,which may be the main cause of long-term diarrhea in cholecystitis patients.Additionally,the cobalamin biosynthesis level in the cholecystitis group was significantly reduced,resulting in an increase in methylmalonic acid,reducing the inhibition of fatty acid synthesis,indicating its association with abnormalities in fatty acid and cholesterol metabolism.5.The accuracy rates in disease diagnosis models are: 91.01% for drug use disorder,95.55% for type 2 diabetes,93.79% for Parkinson’s syndrome,and 94.67%for cholecystitis.The multiple factor individual identification model achieved an accuracy of 77% for disease,73% for gender,76% for region.The diagnostic accuracy rate reaches the advanced level at home and abroad.The age regression model had an error of ±8.4 years,the height model had an error of ±6.7 cm,the weight model had an error of ±9.6 kg,and the BMI model had an error of ±2.7.This study is the first to systematically analyze of gut microbiota diversity and function in four diseases from four regions of China at the same time.It established4 disease diagnostic models and 7 individual identification models including disease,gender,region,age,weight,height,smoking and alcohol habits,and BMI.These models demonstrated certain capabilities in disease diagnosis and individual identification,which lays a solid theoretical foundation for the application fields of early screening and early warning of disease,auxiliary diagnosis,fecal transplantation treatment,forensic individual identification and appraisal,and suspect tracking. |