| Background: Thyroid cancer(TC)is the fastest growing solid malignancy in the past 20 years,with an average annual incidence increase of 6.2%,and is the fifth most common cancer among female malignancies,of which 80% of new cases are papillary thyroid cancer(PTC).At present,the test methods used in clinical practice have shortcomings such as false positives,false negatives,radiation leakage and excessive diagnosis.A large number of studies have shown that changes in the structure of oligosaccharides are accompanied by the occurrence and development of tumors.In this experiment,the salivary glycoprotein of PTC patients before and after surgery,patients with benign thyroid nodule(BTN)and healthy volunteers(HV)was studied by lectin microarray.Based on the salivary glycoprotein data,various machine learning algorithms were used to construct PTC diagnostic models to evaluate and explore the possibility of salivary glycoprotein as a diagnostic and prognostic biomarker of PTC patients.Method: A total of 105 saliva samples were collected,including 30 HVs,22 BTNs,27 PTCs and 26 PTC paired postoperative samples.Firstly,the lectin microarray was used to detect one by one,and the lectins expressed differently among the three groups of HV,BTN and PTC were screened,and then the differences in salivary glycoprotein in HV and PTC patients before and after operation were compared,and the above results were verified by lectin blotting.Then,the HV,BTN and PTC samples were divided into training set and testing set according to the ratio of 7:3,and the K-Nearest Neighbor(KNN),Multilayer Perceptron(MLP),Random Forest(RF),Logistic Regression(LR)and Support Vector Machine(SVM)were used in the training set to construct the diagnostic models,and then the 5-fold cross-validation was used to optimize hyperparameters,and trained the final model,and finally evaluated the models on the testing set,including confusion matrix,accuracy,recall,precision,and ROC curve analysis(Receiver operating characteristic curve).Result:(1)The results of the lectin microarray showed that the normalized fluorescentintensities(NFIs)of a total of 17 lectins had significant differences among HV,BTN,and PTC.Compared with HV,the NFIs of 10 lectins had significant differences in BTN or PTC,but there was no difference between BTN and PTC.Significant differences,such as that the biantennary complex-type N-glycan with outer Gal binder PHA-E,the α-D-Man,Fucα1-6Glc NAc,α-D-Glc binder PSA,the α-D-Man,Fucα1-6Glc NAc,α-D-Glc binder LCA and the terminal Gal NAc,Gal NAcα-Ser/Thr(Tn),Gal NAcα1-3Gal binder VVA,etc.;the NFIs of 7 lectins showed significant differences between BTN and PTC,such as the Gal NAc,Gal NAcα-1,3Gal,Gal NAcα-1,3Galβ-1,3/4Glc binder PTL-I,the Fucα1-2Galβ1-4Glc NAc and Fucα1-3(Galβ1-4)Glc NAc binder LTL and the Terminal in Gal NAc and Gal binder SJA.The result of SJA’s lectin blotting were consistent with the results of the lectin microarray.(2)A total of 6 lectins that were up-regulated or down-regulated after operation,including the Glc NAcβ1-3Gal NAcα-Ser/Thr binder Jacalin,the Gal NAcα-Ser/Thr and Gal NAc binder MPL,the Galβ1-3Gal NAcα-Ser/Thr binder PNA,the Galα1-3(Fucα1-2)Gal binder EEL,the β1-4Glc NAc and Lac NAc binder DSA and the Siaα2-6Gal/Gal NAc binder SNA,and the above 6 lectins were compared with HV,the NFIs of them had a tendency to return to HV levels.The result of Jacalin’s lectin blotting were consistent with the lectin microarray.(3)Based on the training set data,the model construction(KNN,MLP,RF,LR and SVM)and hyperparameter optimization(5-fold cross-validation)were carried out,and finally the models were evaluated in the testing set to identify HV,BTN and PTC..In the testing set,the best performing model was the SVM model,its AUC value reached0.94,the sensitivity was 0.92,the specificity was 0.96,the recall rate was 93.94%,the precision rate was 82.01%,and the accuracy rate was as high as 92%.It could correctly distinguish 9 out of 9 cases of HVs,5 out of 7 cases of BTNs,and 9 out of 9 cases of PTCs.The above results indicated that salivary glycopatterns may be a biomarker for PTC screening and prognosis,and machine learning may be an aid to improve the accuracy of diagnosis. |