| Objective:The gene expression data often associated with the methodological problem of having many more gene expression values than individuals, and Col-linearity exists between variables, which is not satisfy the requirements of classic statistic methods. So the standard Cox regression techniques cannot be applied to predict survival for the gene expression data. Supervised principal components analysis and Partial least squares Cox regression methods solve these problem by combine the Cox proportional hazards model with technique of dimension reduction. We try to discover the relation of genomic and death by the simulation experiments and real data analyses, then we can provide more accurate prognosis and improve the treatment strategies for patients.Method:The principle and methods of Supervised principal components analysis and Partial least squares Cox regression will be introduce in this paper. According to the features of gene expression data, we design a series of simulation experiments, then analyses the simulation data with the two technologies mention above to compare the prediction performance of them. Three publicly available data sets were analyzed with the two technologies mention above too. Simulation data sets were generated by using R statistics software. We use as platform to handle the analysis of our simulation experiments and real-life data were analysis by using MATLAB 7.1.Result:We took the deviance and the coefficient of determination R2 as the predictor evaluation. Through the simulation experiments, we can found that:the results are quite dependent on the variance scenario, we can see that both methods are performing better with increased variance of the two blocks of genes with effects. We also note that that both methods are performing better when the within-group correlation p is increased. Both methods are performing poor with censoring rates increased. Through the real-life data, we can found that, the conclusion of witch method is better is different from data.Conclusion:Supervised principal component analysis and Partial least squares Cox regression methods are applied to the survival prediction of gene expression data. The prediction performance of Supervised principal component regression is better than Partial least squares Cox regression. Partial least squares Cox regression calculation fast in general, witch is an advantage over Supervised principal component. |