Font Size: a A A

Some Integrative Applications Of Statistical Model And Machine Learning Method In Interdiscipline

Posted on:2020-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y TangFull Text:PDF
GTID:2370330575466410Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Classical statistical models usually have good explanatory properties,but sometimes they do not perform well in predicting performance.On the contrary,machine learning methods show good predictive performance in some problems,but it is often difficult to explain the problem mechanism.In the face of practical problems,the proper combination of statistical model and machine learning method is helpful for further study and revealing its mechanism.In this paper,the integration of statistical model and machine learning is explored through two interdisciplinary studies,namely,"the synthesis and growth mechanism of metal-organic frameworks"and"the analysis of related factors of recurrence of esophageal cancer".In exploring the synthesis and growth mechanism of metal-organic frameworks,we focus on the classification of phase and the prediction of thickness of target products.For the classification of phases,the precise classification of phases is realized by using random forest,and the Kappa value reaches 0.86.For the prediction of thickness,while retaining the good explanatory power of linear regression model,cluster analysis is used to find the area where the linear regression model performs best,and then discriminant analysis as well as logistic regression are used to explore the characteristics of the sample space.Moreover,in order to further explore the reaction mechanism,intermediate products were introduced.The regression model and stepwise regression to select variables were performed to study the relationship between reactants and intermediate products.Then the amount of intermediate products can be predicted using the data which only contains reactants and target products.Finally,we transform the problem of thickness prediction into a "thick-thin"two-class problem,and build a random forest model based on reactants and predicted intermediate products.After comparison,it was found that the pr edictive performance of the random forest model was significantly better than that of the reactant-only modcel,and the Kappa value increased from 0.4067 to 0.6179.This suggests that the introduced intermediates play a key role in thickness study.In the analysis of the related factors of recurrence of esophageal cancer,we first made a single factor analysis,using Pearson Chi-square test,Fisher accuracy test and log-rank test to explore the correlation between age,sex,length of lesion and recurrence of tumor bed,recurrence of anastomotic site,distal metastasis and survival time.It was found that the recurrence rate of anastomotic stoma was higher in patients with positive incision end?P=0.064?,the distal metastasis rate was higher in patients with long lesion length?P=0.091?or ulcerative tumors?P=003?.It was also found that patients with long lesions?P=0.068?,or more number of lymph nodes?P=0.081?,or positive incisions?P=0.015?had a shorter survival time.According to the results of univariate analysis,some variables were included in multivariate analysis.Logistic regression model and COX proportional hazard model were established and it was found that the length of lesions,types of tumors and types of incisions were risk factors.Patients with long lesions and ulcerative tumors had a higher risk of distal metastasis,while patients with long lesions and positive incisions had a shorter survival time.Finally,the tree-based machine learning algorithm iRF was used to explore the possible interaction among variables.It was found that there was interaction between T stage and number of lymph nodes for recurrence in tumor bed area,between length of lesion and number of lymph nodes for distal metastasis,and between length of lesion and number of lymph nodes for recurrence of anastomosis.
Keywords/Search Tags:Statistical Model, Machine Learning, Integrated Application
PDF Full Text Request
Related items