Font Size: a A A

Application Of Data Mining Technology In Protein Biomarker Discovery

Posted on:2022-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y YangFull Text:PDF
GTID:2480306479478974Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
The rapid development of proteomics has provided unprecedent opportunities for the discovery of cancer biomarkers.Due to the high dimension and sparsity of mass spectrometry data,this study established a protein biomarker discovery workflow using data mining technology,which we named PBMiner.The process consists of four modules: data preprocessing,feature selection,modeling and classifier assessment,and finally lockdown the biomarker panels.We applied PBMiner to a mass spectrometry dataset of diffuse large B cell lymphoma(DLBCL),and identified a diagnostic panel composed of 6 proteins(PALD1,TBC1D4,TNFAIP8,CMAS,MME and PTPN1),and used this panel to construct a random forest model and a non-linear SVM model.The area under the receiver operating characteristic(ROC)curve(AUC)on the training set and the testing set are all equal to 1,leading to a complete distinction of the two subtypes of DLBCL.We also applied PBMiner to a recent mass spectrometry dataset of lung adenocarcinoma(LUAD),and identified a diagnostic panel composed of 19proteins(ABCF1,LAMC1,SRP72,AGER,etc.).The random forest model constructed by this panel had an AUC of 1 and 0.99 on the training set and the testing set,which distinguished tumor and para-tumor tissue with at least 97.5% accuracy.In conclusion,PBMiner provides a fast and effective pipeline to explore diagnostic and molecular stratification protein markers.
Keywords/Search Tags:proteomics, biomarker, data mining, machine learning
PDF Full Text Request
Related items