| Objective:This study aimed to explore the risk factors of early intraductal malignant related lesions in patients with pathological nipple discharge(PND)and further construct prediction models through machine learning(ML)to predict pathological results and guide clinical diagnosis and treatment.Methods:This study retrospectively analyzed clinicopathologic data of patients with PND who received surgical treatment in the Department of Breast Surgery at China-Japan Union Hospital of Jilin University from 2016-06-01 to 2019-04-30.Patients were divided into intraductal benign lesion and early intraductal malignant related lesion groups.The Chi-square test,Fisher exact test and Logistic regression analysis were used to analyze the risk factors of early intraductal malignant related lesions.The clinical features of all diseased breasts with PND were incorporated into the Py Caret module of Python library.The Random Forest(RF)model with the highest AUC value was finally selected to build the prediction model by comparing of 14 ML models.Finally,ROC curve and AUC value,Confusion Matrix,and Feature Importance were used to evaluate the model.Results:A total of 300 patients with PND(including 355 diseased breasts)were collected,consisted with 259 intraductal benign cases(72.96%)and 96 early intraductal malignant related cases(27.04%).The Chi-square analysis showed statistically significant differences in age,BMI,history of hypertension,menopausal status,discharge property,change of discharge property,tumor with hyperemic surface,the size,shape,boundary and BI-RADS grade of the tumor,malignant microcalcification between the two groups.Multivariate logistic regression analysis showed that age in35-49 years(OR=6.836,p=0.024),age≥50 years(OR=8.473,p=0.032),hemorrhagic discharge(OR=2.316,p=0.006)and size≥1cm(OR=4.246,p=0.024)were independent risk factors for early intraductal malignant related lesions.The machine learning results indicated that the Random Forest model had good prediction value:the ROC curve showed that the AUC values for both benign and malignant classifications reached 0.88;the Confusion Matrix showed that the predictive accuracy of RF model was 85.2%,precision was 91.7%,sensitivity(recall)was61.1%,specificity was 97.2%,and F1 value was 73.3%;the Feature Importance plot suggested that mammography with malignant calcification signs,diffused distribution of ductal masses,and BI-RADS≥4 of mammography all showed evident importance in predicting correctly the outcome of intraductal lesions of patients with PND.Conclusion:For patients with PND who aged≥35,accompanying with hemorrhagic nipple discharge or tumor size>1cm,clinicians should be highly concerned about the possibility of the early malignant intraductal malignant lesions.The RF model has high accuracy in predicting intraductal lesions of patients with PND and can assist clinicians in identifying early intraductal malignant lesions. |