Font Size: a A A

Prediction Of Disease Risk Based On Ensemble Cox Regression Analysis

Posted on:2024-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:F YangFull Text:PDF
GTID:2530306923475394Subject:Statistics
Abstract/Summary:PDF Full Text Request
In the context of consumer upgrading,the public’s desire for a better life is becoming stronger and stronger,and with it the demand for health management has increased dramatically,of which the use of scientific experience to prevent the occurrence of various diseases is generating great interest.The use of statistical models to construct risk prediction models for estimating the probability of disease occurrence in individuals facilitates the early screening of people at risk,while revealing risk factors associated with disease or disease prognosis can help with targeted interventions or health management to delay or even stop the progression of disease.However,disease risk prediction often involves imbalanced survival data,i.e.,data with disparate positive and negative sample proportions,and since misclassifying diseased individuals as healthy individuals can lead to greater losses,researchers are more interested in how to improve the accuracy of predictive models in identifying high-risk individuals,which greatly challenges the performance of traditional statistical models applied to disease prediction.Focusing on such imbalanced survival data,this paper proposes a disease risk prediction model based on ensemble Cox regression analysis,referred to as the EPCR model,whose main idea is to improve the prediction performance by combining the prediction results of multiple base learners in a simple averaging manner.First,the majority category samples are randomly down-sampled to generate a subset with the same sample size as the minority category,while the minority category samples are bootstrap sampled to generate self-help samples,which are combined to form a category-balanced subset;the above process is repeated several times to generate multiple category-balanced subsets,based on which multiple penalized Cox regression models are trained independently;finally,for a given number of individuals to be predicted Finally,for a given number of individuals to be predicted,the prediction probability given by each prediction model is simply averaged as the final prediction output.In addition,based on this ensemble framework,a reliable measure of the correlation between risk factors and disease occurrence is defined,providing a more robust way to screen for risk factors that are strongly associated with disease occurrence and can assist in early intervention.Through numerical simulations,the paper validates that the EPCR model significantly improves predictive performance and accurately screens for important variables compared to several single models.The proposed EPCR model was applied to the Chinese Breast Cancer Cohort Study(BCCS-CW)database to develop a breast cancer risk assessment model.72 factors are included in the BCCS-CW database.covering all aspects of women’s health.The PSIS pre-screening procedure was used to pre-process each of the 72 factors.The results show that the EPCR algorithm proposed in this paper can improve the accuracy of incidence prediction to a certain extent and provide a robust ranking of the correlation of breast cancer risk factors compared to the classical Gail model.
Keywords/Search Tags:Disease Risk Prediction, Imbalanced Survival Data, Ensemble Learning, Cox Regression Analysis, Absolute Risk
PDF Full Text Request
Related items