Font Size: a A A

The Study Of Imputation Methods For Missing Values Based On LASSO In Compositional Data

Posted on:2018-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y TianFull Text:PDF
GTID:2359330521451775Subject:Statistics
Abstract/Summary:PDF Full Text Request
In the field of social research,biomedical,economic management etc.,due to various reasons,the collected data often contain a large amount of missing values;on the other hand,in the field of gene life science,financial mathematics etc.,with the development of science and technology,the dimension of available data is getting higher and higher,and a lot of high-dimensional data will appear frequently.The complexity of missing values and the high-dimensional nature of the data sets make the traditional statistical methods no longer suitable.Therefore,in the case of missing values and high-dimensional data,how to carry out effective statistical inference has aroused the concern of many scholars.For the missingdata,in the last 80 years of the research process has made a lot of research results,a series of efficient methods of processing missing values are proposed;for high-dimensional data,because the data sets are often sparse data in essence,so,the problem of variable selection becomes one of the core problems of high-dimensional data.Compositional data is mainly used to study the composition of the overall proportion of each part,the general solution of compositional data is to transform the compositional data in single space into the common data in Euclidean space,and then the Euclidean data were analyzed statistically.In the case of the transformation process or the collection reasons of subjective and objective for the compositional data,it will lead to compositional data contains a large number of missing values.How to estimate the missing values in the compositional data and get the complete data sets is the primary task of the statistical analysis for compositional data.In this paper,a new imputation method is proposed to deal with the high-dimensional compositional data,and the new method is compared with the mean imputation method,k-nearest neighbor imputation method,iterative regression imputation method.The research content of this subject is how to deal with the missing data and vari-able selection,the use of appropriate methods to delete the missing data model,and the preprocessing of the missing data,including the following work:(1)The study of the causes of missing data,and the patterns and mechanisms of missing data;(2)The study of the various methods of processing missing data;(3)The study of the method of missing data based on variable selection method;(4)The using of simulation and examples of various methods for comparative analysis;(5)The summarizing of this article,and put forward further research direction.
Keywords/Search Tags:compositional data, high-dimensional data, missing values, variable selection, LASSO algorithm
PDF Full Text Request
Related items