Application Of Data Mining Technology In Protein Biomarker Discovery

Posted on:2022-08-31

Degree:Master

Type:Thesis

Country:China

Candidate:Y Y Yang

Full Text:PDF

GTID:2480306479478974

Subject:Biochemistry and Molecular Biology

Abstract/Summary:

PDF Full Text Request

The rapid development of proteomics has provided unprecedent opportunities for the discovery of cancer biomarkers.Due to the high dimension and sparsity of mass spectrometry data,this study established a protein biomarker discovery workflow using data mining technology,which we named PBMiner.The process consists of four modules: data preprocessing,feature selection,modeling and classifier assessment,and finally lockdown the biomarker panels.We applied PBMiner to a mass spectrometry dataset of diffuse large B cell lymphoma(DLBCL),and identified a diagnostic panel composed of 6 proteins(PALD1,TBC1D4,TNFAIP8,CMAS,MME and PTPN1),and used this panel to construct a random forest model and a non-linear SVM model.The area under the receiver operating characteristic(ROC)curve(AUC)on the training set and the testing set are all equal to 1,leading to a complete distinction of the two subtypes of DLBCL.We also applied PBMiner to a recent mass spectrometry dataset of lung adenocarcinoma(LUAD),and identified a diagnostic panel composed of 19proteins(ABCF1,LAMC1,SRP72,AGER,etc.).The random forest model constructed by this panel had an AUC of 1 and 0.99 on the training set and the testing set,which distinguished tumor and para-tumor tissue with at least 97.5% accuracy.In conclusion,PBMiner provides a fast and effective pipeline to explore diagnostic and molecular stratification protein markers.

Keywords/Search Tags:

proteomics, biomarker, data mining, machine learning

PDF Full Text Request

Related items

1	The Application Of Data Mining And Machine Learning In Astronomy
2	Research On Intelligent Clinical Decision-Making Based On Machine Learning Method
3	Research On Commonality Of Fungal SRNA Transboundary Regulation Mechanism Based On Machine Learning
4	Establishment Of The Platform For Organelle Protein Profiling With Data Mining And Application In Human Liver Nuclear Proteome Research
5	The Study Of High Resolution Remote Sensing Image Classification Based On Extreme Learning Machine
6	Machine Learning Based On Proteomic Data To Predict Lung Cancer Recurrence
7	Refined Temperature Prediction Technology Based On Machine Learning And Data Mining
8	Mining Probiotic Genome Molecular Markers And Constructing A Visual Screening Prediction Platform Based On Machine Learning
9	Data Mining And Behavior Recognition Model Of Fishing Vessel Based On Machine Learning
10	Research On Prediction Model Of Plant Moonlighting Protein Based On Machine Learning