Font Size: a A A

Ultra-high Dimensional Missing Data Analysis Based On Model Averaging

Posted on:2022-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y F GaoFull Text:PDF
GTID:2480306539953399Subject:Mathematics
Abstract/Summary:PDF Full Text Request
With the rapid development of science & technology,and the continuous improvement of data collection ability,ultra-high dimensional data appears more and more frequently in the public field of vision.Due to the huge amount of these kind of data,it is a difficult problem to analyze them.However,in medicine,genetics,sociology and many other fields,ultra-high dimensional data may not be observed completely.Compared with the full data,it is more difficult to analyze such data.Therefore,it is meaningful to study the ultra-high dimensional data which can not be observed completely.In this paper,the research is carried out under the background of ultra-high dimensional data with responses missing at random.The specific contents are as follows:The first chapter introduces the research background and the significance of this paper systematically.At the same time,it also introduces the research situation of ultra-high dimensional data at home and abroad.This chapter also reveals the main contents and the innovation of this paper.The second chapter proposes two feature screening methods.The first one is called CQFS,which is based on the conditional quantile of the predictor X,when the response Y is given.The second one is called MACQFS.This method combines the above aforementioned conditional quantile and the idea of model averaging.The mentioned two feature screening methods all satisfy the sure screening property.The results of Monte Carlo numerical simulation and the analysis of Lung cancer data show that CQFS and MACQFS all have robust screening performance and have excellent practical application effect.Compared with CQFS and MACQFS,the latter is better.Based on the second chapter,combining the conditional quantile of the response Y,when the predictor X is given,the inverse probability weighted technique and the idea of model averaging,the third chapter proposes MMACQ feature screening method.This method is designed to reduce the dimension of the ultra-high dimensional data with responses missing at random.By proving,MMACQ feature screening method satisfies sure screening property.Via numerical simulation,the excellent screening performance of MMACQ feature screening method is confirmed.Under the background of the ultra-high dimensional data with responses missing at random,the fourth chapter proposes a ‘two-step' analysis method.Firstly,the MMACQ feature screening method proposed in the third chapter is used to reduce the dimension.Secondly,proposing GMA algorithm based on the combination of the augmented inverse probability weighted and Mallows' criterion,aims to analyze the data after dimension reduction.Using numerical simulation and the cardiomyopathy microarray dataset to verify the feasibility of this‘two-step' method,the results show that this method can effectively reduce the prediction error.At the same time,the above inspection processes also confirm that MMACQ feature screening method has practicality value.The fifth chapter makes a detailed summary of the contents of this paper,and puts forward some shortcomings.The fifth chapter also thinks about these shortcomings and brings on the future prospect.
Keywords/Search Tags:Ultra-high dimensional missing data, Conditional quantile, Model averaging, Inverse probability weighted technique
PDF Full Text Request
Related items