Font Size: a A A

Study On Histological Grading Of Breast Cancer Based On Multivariate Logistic Regression And Decision Tree

Posted on:2020-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:H L WuFull Text:PDF
GTID:2404330596993439Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
As one of the common malignant tumors that threaten women's health,breast cancer is more common in Europe and the United States.In recent decades,the incidence of female breast cancer has risen gradually in Asia,which is not a region with high incidence of breast cancer.Although the incidence of breast cancer in China is low at present,the rate of increase in the rate of incidence is double that of the world.And in China's coastal areas as well as in first-tier cities,the incidence of breast cancer among women is even more worrisome.In women aged 50 or so,the incidence of malignant breast cancer is particularly obvious.In order to describe the degree of malignancy of breast cancer,a study was conducted on the grade(degree of cancer cells differentiation).In order to find out which factors mainly affect the histological grade(differentiation degree)of breast cancer,we selected all the factors related to breast cancer from the American cancer database.Because of the large number of patients' data and the original selection of variables,it is necessary to pre-process the data,including using the significance of variables in multiple logistic regression analysis to determine the choice of variables.The variable whose P value is far greater than 0.05 is deleted,and the controversial variables are determined by the analysis of significance and accuracy in the second multivariate logistic regression analysis.Then we use the reserved variables including social impact factors in patients' county and pathological factors,which have Marital status at diagnosis?% <9th grade education ACS 2011-15?% <High school education ACS 2011-15?% Current Smoker?% Families below poverty ACS 2011-15 ? %Persons below poverty ACS 2011-15 ? Normalized cost-of-living index and tumor size?age at diagnosis?ER Status Recode Breast Cancer(1990+)?PR Status Recode Breast Cancer(1990+)?Derived HER2 Recode(2010+)?Regional nodes examined(1988+)?Primary Site,to analyze the severity of breast cancer in women between 45 and 74 years of age in 2011 and 2015.After all these,we divide the data into training set and test set,and all kinds of decision trees are constructed to fit the prediction.Finally,the C5.0 algorithm and the CRT algorithm of the decision tree are determined to analyse our data.And the accuracy of 0.8003 and0.7983 is obtained,which effect is good.AT the same time,a high sensitivity score of0.9 is obtained too,and the model had a good predictive value for patients with better differentiation of cancer cells.Based on the analysis of the effects of variables on histological grading,the following recommendations are made for women who are not ill: 1.Regular inspection,2.Self-improvement,non-smoking,and learn to release financial pressure.
Keywords/Search Tags:Breast cancer, Multiple Logistic Regression, Decision Tree, C5.0, CRT
PDF Full Text Request
Related items