Font Size: a A A

Study On Partial Least Distance Square Regression And Correlation Analysis Methods

Posted on:2024-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y W DuFull Text:PDF
GTID:2530307142963359Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Partial Least Square(PLS)is a multivariate linear statistical analysis method integrating principal component analysis(PCA),canonical correlation analysis(CCA),and multiple linear regression(MLR)analysis.It can effectively solve the problems of multiple correlation between variables and small sample size.However,PLS also has some imperfections,on the one hand,PLS extracts the latent variables based on the maximization of Pearson correlation coefficient of independent and dependent variables.Here,Pearson correlation coefficient usually cannot measure the nonlinear relationship between variables,so the latent variables cannot ensure the strongest interpretation.On the other hand,PLS applies MLR to the extracted latent variables,which cannot truly reflect the nonlinear relationship of data,so the regression function is usually under-fitting.The above two imperfect designs are the main reasons for the low regression accuracy and prediction performance of PLS for nonlinear data.Therefore,it is necessary to propose a regression method suitable for both linear and nonlinear data.The new method should not only retain the framework of PLS regression,but also achieve better regression performance and accuracy.This paper carried out the following research work:(1)Partial least distance regression method.A new partial least distance squares regression method(PLDS)was proposed to solve the problem that partial least squares could not satisfy the analysis requirements of nonlinear relational data.Firstly,the new method calculates the Euclidean distance between variable samples and obtains the distance matrix of independent and dependent variables respectively.Secondly,distance components of independent and dependent variable are extracted based on distance variance and distance correlation coefficient maximization of original variables.Finally,quasi-linear regression is carried out for the extracted distance components.Experiments show that PLDS has better regression effect and wider application advantages regardless of the existence of nonlinear relationship between variables.(2)Optimized partial least distance square regression method.Aiming at the problem that the regression equation constructed in the regression part of PLDS is a quasi-linear regression equation expressed by distance,the structure of the quasi-linear regression equation is complex,which cannot directly reflect the functional relationship between the original variables.Therefore,an optimized partial least distance squares(OPLDS)regression method is proposed.Since the regression equation obtained by the PLDS model is a regression expression described by distance,it needs to be transformed into the representation of the original data and perform prediction in the end.However,in the research content(1),the conversion process of regression equation expressed by distance transform into the original data is relatively simple,and the error in the conversion process is large.Although compared with PLS,PLDS has better advantages,however,compared with support vector machine,ridge regression and decision tree regression,the advantages are not obvious.Therefore,a better solution is proposed for the conversion part of the regression equation of PLDS,which makes the error of regression equation in the conversion process smaller and the accuracy of the model higher.(3)A new research for detecting complex associations between variables with randomness.Pearson coefficient is used to measure the correlation between variables in PLS,but Pearson coefficient cannot measure the nonlinear relationship between variables,and most correlation analysis methods do not consider the impact of data uncertainty and distribution,which leads to the neglect of regularity information between variables in the process of correlation evaluation.This paper presents a new analysis method to measure the correlation between random variables,which divides the relationship between variables into two cases,namely variables that contain functional relationship but subject to specific distributions and the variable without explicit function relationship.Firstly,the cubic B-spline approximate fitting method was used to regression the variable data,and the regression error of the model was calculated to evaluate the degree of functional relationship between variables.Then,the normalized information entropy between variables is calculated to evaluate the degree of uncertainty between variables.Finally,the copula function is used to evaluate the dependence on the random distribution of variables.The calculated regression error,normalized information entropy and copula correlation coefficient are weighted and summed by AHP method to get the final correlation coefficient R.The new method not only considers the degree of functional relationship of variables,the degree of uncertainty and the degree of dependence on random distribution of variables,can not only measure the correlation of functional relationship variables with specific distributions,but also can better evaluate the correlation of variables without clear functional relationships.
Keywords/Search Tags:Partial least square, Distance correlation coefficient, Nonlinear regression, Correlation analysis, Copula function, The approximate fitting based on cubic B-spline
PDF Full Text Request
Related items