Font Size: a A A

Bioinformatics-and Machine-learning-based Study Of Risk Factors For Calcific Aortic Valve Stenosis And Construction Of Prediction Models

Posted on:2023-08-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:C SongFull Text:PDF
GTID:1524306773962449Subject:Surgery
Abstract/Summary:PDF Full Text Request
Objective:Calcific aortic valve stenosis(CAVS)is a degenerative disease involving abnormal lipid metabolism,inflammatory cell infiltration,and abnormal ossification of the valve,which is characterized by a decreased effective opening area of the aortic valve and subsequent hemodynamic changes following calcium deposition.To date,the main treatment options for CAVS include surgical aortic valve replacement and transcatheter aortic valve implantation,with no clear effective medical treatment available to delay the progression of the disease.Therefore,with the changing of social demographics and disease spectrum,the prevalence of CAVS will probably continue to increase and be a burden to the social public health system,and it is especially important to change the health care requirements for CAVS from surgical intervention to preventive management.Therefore,it is clinically important to screen for molecular signaling pathways involved in CAVS disease progression,hub genes and noncoding RNAs that potentially play a regulatory role,and to further explore the molecular mechanisms of aortic stenosis to find molecular targets for early diagnosis,prevention and specific treatment.Meanwhile,rapid and efficient prediction of the occurrence of CAVS based on risk factors may help to screen the potentially affected population at an early stage or to help the specific population to take preventive measures at an early stage.Nevertheless,it is important to assign reasonable and optimal weights to the complicated clinical variation informations and use them for CAVS model construction.The objectives of this study include: 1.To explore CAVS-related differentially expressed genes,enriched signaling pathways,protein-protein interaction networks,and potential regulatory miRNAs using publicly available databases,and to validate differentially expressed genes in external datasets.2.Based on the CAVS risk factors identified until now,combined with the signaling pathways involved in differentially expressed genes obtained by data mining,we screened the common clinical information features,evaluated the discriminative performance and model generalization ability of prediction models based on nine classical machine learning algorithms,and obtained machine learning prediction models with optimal prediction performance.3.Based on the ability to identify the potential CAVS risk population,we introduced the concept of neural network and predicted the risk of disease and disease progression by multilayer perceptron neural network,and incorporated the classical machine learning algorithm with the best prediction performance in the second part of the study for a control study to explore the prediction performance of classical machine learning models and neural network models at different stages of CAVS progression.To explore the prediction performance of classical machine learning models and neural network models at different stages of CAVS progression.Methods1.In the first part,the GEO database was searched to obtain CAVS-related datasets or expression profile microarray data,and CAVS-related differentially expressed genes were screened based on P-value and Log fold-change(Log FC),followed by comprehensive bioinformatics analysis,including Gene Ontology enrichment analysis,Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis,protein-protein interaction network construction,co-expression network analysis,and screening of core genes and potential miRNAs,followed by validation of hub genes in external datasets.2.In the second part,with the occurrence of CAVS as the dichotomous dependent variable and 42 CAVS-related clinical feature variables screened as independent variables,machine learning algorithms including Logistic Regression,K-Nearest Neighbor,Random Forest,Support Vector Machine,XGboost,Light GBM,Adaboost,Gaussian Parsimonious Bayes,and Supplementary Parsimonious Bayes were used for the construction of CAVS prediction models with the included feature variables,followed by a comprehensive evaluation of model discriminative performance,learning curve,external generalization ability,and the magnitude of feature weights.3.The multiclass prediction of the overall risk of occurrence and stenosis severity of CAVS using multilayer perceptron neural networks is compared with the random forest algorithm screened in the second part to evaluate the prediction performance of two different algorithms in the field of artificial intelligence in the stenosis of CAVS.Results1.In this study,a total of 78 differentially expressed genes,including 58 up-regulated expressed genes and 20 down-regulated expressed genes,were screened by intersection of differentially expressed genes from three datasets,GSE153555,GSE12644 and GSE51472.The signaling pathways were mainly enriched in matrix remodeling and immune system,and the involved clinical features included immune related,redox reaction and energy metabolism,neurotransmitter regulation,electrolyte regulation,aldosterone production and metabolism,diabetes related metabolic pathways,and so on.Ten hub genes(IBSP,NCAM1,MMP9,FCGR3 B,Col4A3,FCGR1 A,THY1,RUNX2,ITGA4 and COL10A1)and their co-expression modules were screened by constructing PPI networks,and the co-expression module signaling pathways were mainly concentrated on the regulation of extracellular matrix,endodermal cell differentiation,cell adhesion molecule binding,and leukocyte migration across the endothelium.Five miRNA databases were screened to identify hsa-miR-1276 as potential miRNAs,and Network Analyst was used to construct miRNA gene regulatory networks.Finally,all differentially expressed genes and hub genes were externally validated respectively using the GSE83453 dataset,in which eight core genes(IBSP,NCAM1,MMP9,FCGR3 B,Col4A3,FCGR1 A,THY1 and COL10A1)were significantly differentially expressed in GSE83453.2.In this part of the study,nine classical ML models were compared using ROC curves,learning curves,calibration curves,DCA,differences in feature weights,and model generalization ability.The CAVS risk factors such as homocysteine,alkaline phosphatase,Lp(a),serum phosphorus,creatinine,bicuspid,and age were screened.By calculating the combination of feature variables,the random forest and XGboost models showed excellent classification discriminatory ability,superior learning efficiency and model stability in CAVS disease prediction,the sensitivity and specificity of F1 scores in the random forest training set were0.813,92.1% and 92.5%,respectively;the AUC in the validation set was 0.9670±0.0170;and the AUC and accuracy in the test set were 0.985 and 0.928,respectively.XGboost provides accurate and stable CAVS prediction using only 6 common characteristics,and therefore has better clinical applicability.In addition,we combined random forest and XGboost to select the10 variables with the highest weights for CAVS risk prediction nomogram plot,and developed an online calculation tool with full open access to facilitate risk assessment for clinicians and potential risk groups.3.In this section,the prediction performance of the models was evaluated by accuracy,recall and F1 scores,and both models performed well overall.The two models had high accuracy and F1 scores in the normal population and could accurately determine the occurrence of the predicted population without disease,yet misclassification occurred to some extent in the subgroup classification of the diseased population,with the random forest model outperforming the multilayer perceptron neural network in the subgroup classification prediction.Conclusions1.DEGs signaling pathways were mainly concentrated in matrix remodeling and immune system,while PPI co-expression module signaling pathways were mainly concentrated in the regulation of extracellular matrix and inflammatory mediators.The tissue-specific expression of DEGs suggests that immune system may play an important role in the progression of CAVS.Eight hub genes including IBSP,NCAM1,MMP9,FCGR3 B,Col4A3,FCGR1 A,THY1 and COL10A1 were identified in the GSE83453 dataset,and hsa-mir-1276 may be the non-coding RNA with potential regulatory effect.Clinical features associated with the signaling pathway enrichment results included homocysteine,alkaline phosphatase,LP(a),serum phosphorus,creatinine,etc.The screening of hub genes,enriched signaling pathways and potential miRNAs may provide a new research direction for the study of CAVS interventions.2.The random forest and XGboost models performed better in prediction,suggesting that aortic valve bicuspid,homocysteine,alkaline phosphatase,Apo A1,Apo B,LDL,lipoprotein a,blood calcium,blood phosphorus and blood creatinine levels play a higher weight in CAVS risk prediction.After the model algorithm calculation,both models perform AUC above 96% in the validation set and with favorable generalization performance.Combined with the online dynamic nomogram plot,the models provide theoretical accuracy as well as practical applicability.3.In the process of discriminating the severity of CAVS,random forest shows better prediction performance than multilayer perceptron neural network.Both models suffer from some misclassification in the classification of mild-moderate stenosis and severe stenosis of CAVS,however,the neural network-based prediction model shows strong growth with the increase of sample size,therefore,multilayer perceptron neural network still has greater potential for performance improvement in CAVS prediction.
Keywords/Search Tags:Aortic valve stenosis, Degenerative disease, Risk factors, Disease prediction model
PDF Full Text Request
Related items