Some Studies On Model Sparseness

Posted on:2013-05-05

Degree:Doctor

Type:Dissertation

Country:China

Candidate:B G An

Full Text:PDF

GTID:1220330395471087

Subject:Probability theory and mathematical statistics

Abstract/Summary:

PDF Full Text Request

As the developm entof science and technology, a lot of com plex datasets are col-lected in people’s real practice. In statistic analysis for those datasets, how to build a simple and high perfrom ance model is very in portant. An appropriate simpie sparse model does notonly have good explanation, but also has very high perform ance. We make some sparse studies about some statisticalm odels in this paper.So far, the spaise studies for linearm odelhave been developed very well. Hence we first summarize sparsem ethods for linear regression model. There are also many sparse studies aboutm ultivariate linearm odel, how everm ostof those studies only fo-cus on predictors, sparse studies about multivariate response are very few. In chapter2of this paper, we make some studies form ultivariate linear regression model. Here the sparseness is notonly for predictors but also form ultidin ensbnal response. We first study the relationship between multivariate linear regression model and canoni-cal correlation anaysis, then translate sparse m ultivariate linear regression problem to sparse canonicalloadings probtem.Theoretical results show thatourm ethod has selec-tion consistency. Many num erical sim ulation studies also dem onstrate our theoretical results.Supervised classification tearning has many applications in lots of real fields. Those fields include medical diagnoses, handw ritten recognition, web mining, text classification, and many others. So far, plenty of supervised classification m ethods have been proposed. They include, for example, linear and quadratic discriminant, lo-gistic regression, nearestneighborm ethod, naive bayes, supportvectorm achine (SVM), and many others. Among all these methods, naive bayes classifier is very popular due to its computational simplicity and satisfactory perform ance. How ever, to our best know ledge, how to test its statistical signifi cance underan ultra high dim ensional setup is notw ell studied. In chapter3of this paper, we propose a novel test statistic to test statistical signif icance forthe ultra high din ensionalnaive bayes classifier. Theoretical results ensure asym ptotic norm ality of our statistic.Our num erical simulation studies dem onstrate the theoretical findings. We also try to make variable selection for ul-tra high din ensional naive bayes classifier by our proposed test statistic. By doing so, we can obtain a sparse naive bayes classifier. It can not only ensure classification perform ance, but also has good explanation.Form any statistic data analysis m ethods, a good (inverse) covariance matrix esti-mation isusually necessary. These methods include but not limited to linear (quadratic) discrim inatory analysis, principle component analysis, canonical correlation analysis, gaussian graphialm odel, and many others. T raditionally, sample covariance matrix is a good choice forestim ating covariance matrix. How ever, as the developm entof tech-nology, in many real science fields people can collecta very large num ber of variables for a subject, while the sample size is often limited for some reality reasons. In this case, the variable dimensionism uch largerthan sam pie size, hence sample covariance matrix is no longerpositive definite, butpositive definiteness is often required form ost multivariate statistic analysis methods. Consequently, it is very necessary to search for a good estimator for (inverse) covariance matrix in this high dim ensional case. In the chapter4, we propose a very novelhypothesis testm ethod to estimate the order of the covariance inversem atrix. Theoretical findings show thatourproposed teststatisic is asym ptotic standard normal. Numerical simulation studies can also demonstate our theoretical results. Simulation results also show thatourm ethod can correctly estimate the orderof the covariance inverse matrix very well in mostsituations.

Keywords/Search Tags:

Modelsparsity, Variable selection, Lasso, Naive bayes, Selectionconsistency, Multivariate linearregression, Canonical comelation analysis, Hypoth-esis test, Covariance inverse matrix

PDF Full Text Request

Related items

1	Credit Risk Management Research Of E-business Based On Naive Bayes Model
2	Research On The Advantages And Disadvantages Of Lasso And Its Improved Methods In Variable Selection
3	Covariance matrix estimation and variable selection in high dimension
4	Sphericity Test Of Covariance Matrix
5	Variable Selection Of Linear Model With Genotype
6	Restricted Statistical Inference And Variable Selection Via Adaptive LASSO For Partially Linear Errors-in-function Models
7	Comparison And Analysis Of Variable Selection Methods In Classical Statistics And Machine Learning
8	Research And Application Of Several Improved Naive Bayesian Classification Algorithms
9	Comparison Of Several Methods For Generating Directed Acyclic Graph By Variable Selection
10	Bayesian Analysis Of The Covariance Matrix