| Breast cancer is a common disease in women.In recent years,the incidence of breast cancer has increased year by year,and the age of onset is getting earlier.At present,breast cancer has surpassed lung cancer and cervical cancer.It has become the top one cancer in women with both incidence and mortality rates.Therefore,research on the diagnosis and discrimination of breast cancer is very important.The traditional diagnosis method is that knowledgeable and experienced physicians make corresponding judgments based on the patient’s examination report.This diagnosis method contains subjective factors and does not have high efficiency.With the development of science and technology,we can try to use statistical models as an auxiliary tool for breast cancer diagnosis to improve the efficiency and accuracy of diagnosis.In the field of medical and clinical diagnosis,the logistic regression model always plays an irreplaceable role and is the first choice for many biomedical model building.The usual logistic regression model constructs a likelihood function based on experimental data,and estimates the values of parameter in the model by maximizing the likelihood function.This thesis still uses the logistic regression model for breast cancer diagnosis and discrimination,but proposes another idea of parameter estimation,that is,using the MCMC algorithm for parameter estimation.This method constructs a Markov chain that converges to a steady state distribution,performs a Monte Carlo simulation of the parameters through sampling iterations,and finally estimates the entire posterior distribution of the parameter.We can not only get the estimated value of the parameter through the estimated distribution,but also have a more intuitive feeling for the distribution of the parameter.Compared with the point estimation,the MCMC algorithm can obtain more information about the parameter.What’s more,the MCMC algorithm is based on Bayesian thought,which can effectively combine the prior knowledge of the parameter with the posterior knowledge of the data.Due to this reason,it also works well in the small sample situation in the medical field.After introducing the relevant basic knowledge,through the comparison of numerical simulation experiments,it is found that the MCMC algorithm can obtain approximate estimation results with other parameter estimation methods,and adding appropriate parameter prior information to the MCMC algorithm can effectively improve the estimation accuracy.Therefore,we can consider that it is reasonable to use the MCMC algorithm for parameter estimation.In the empirical analysis part,this thesis uses the breast cancer patient data collected by the University of Wisconsin to build a model,and uses the MCMC algorithm to estimate the parameters in the logistic regression model.The accuracy of the final trained model on the test set reaches 97.37%,and the AUC value reaches 99.56%,which has achieved good results.Also,the estimated model parameters reveal the influence of each index of the patient on the diagnosis result,which can provide guidance and assistance for the diagnosis and discrimination of breast cancer. |