Font Size: a A A

Research On Support Vector Hazard Machine Model For Interval Censored Data In Counting Process

Posted on:2024-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:G J WangFull Text:PDF
GTID:2530307085468014Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In many fields,censored data often appear,among which interval censored data is common in biomedical research and other related fields.At present,most scholars adopt traditional statistical methods,such as accelerated failure model,proportional risk model,etc.Due to the particularity of interval censored data,traditional methods generally have poor results and overfitting problems in prediction,especially in nonlinear small sample data sets.With the development of machine learning in recent years,its powerful prediction,regression,and classification capabilities have provided new ideas for predicting survival time.Because machine learning is seldom combined with interval censored data,this paper uses counting process to represent event time data,and links the support vector machine that has outstanding performance in processing nonlinear small sample data in supervised learning with hazard regression in standard survival analysis,so that supervised learning technology can be used to predict interval censored data.This article mainly conducts two parts of research,as follows.In the first part of this article,a support vector hard machine model based on right censored data is proposed for interval censored data under single and multiple imputations.Single interpolation uses the midpoint substitution method to interpolate interval censored data into right censored data,which is then modeled in a support vector danger machine.Multiple imputation is based on chain equation and PMDA algorithm,and an algorithm combined with support vector danger machine is constructed on interval censored data.The predicted survival time is obtained through iterative solution.The hyperparameter of the model is optimized by genetic algorithm.In the simulation study,the effects of several imputed SVHMs were compared by correlation coefficient and root mean square error under different deletion ratios,noises and limited samples,and were applied to the interval type I mouse tumor dataset and the interval type II patient infected with AIDS due to drug abuse dataset,verifying that the interval deleted data SVHMs based on interpolation also have good performance in actual data analysis.In the second part of this paper,starting from the data structure of interval type I censored data itself,based on the counting process of interval type I censored data,the prediction of survival time is transformed into the prediction of binary results,and the decision objective function is established.The support vector machine in supervised learning is combined with the risk regression in standard survival analysis to calculate the risk score.With the idea of K nearest neighbor algorithm,An improved model,Interval Censoring Support Vector Hard Machine(IC-SVHM),was established under interval censored data.Without modeling the censored distribution,the survival time of the sample was predicted by estimating the sample risk score,so that the interval I censored data did not require interpolation and could directly predict its survival time.The hyperparameter of the model is optimized by genetic algorithm.In the simulation study,IC-SVHM was compared with several imputed SVHM using correlation coefficients and root mean square errors under different pruning ratios,noise,and limited samples,confirming that the proposed IC-SVHM had better performance than the imputed SVHM.The proposed method was applied to the analysis of mouse tumor datasets,verifying that IC-SVHM also performed well in actual data analysis.
Keywords/Search Tags:Interval censered, Support Vector Machine, Multiple imputation, Risk score, Counting Process
PDF Full Text Request
Related items