Font Size: a A A

Research On Compression Regression And Classification Algorithm

Posted on:2020-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:H J DengFull Text:PDF
GTID:2370330602452312Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the popularity of large-scale high-dimensional data in research and industry,there is a growing demand for scalable computing technologies for data analysis and knowledge discovery.The key to transforming these data into knowledge is to learn efficient statistical models.Due to the limitation of computer performance,there are great limitations in using large-scale data sets to learn models in a single computer environment.At the same time,the existing methods of learning statistical models usually have higher computing costs when applied to large-scale data,or produce models with weak learning ability.In order to solve these two problems,this paper carries out the algorithm and Application Research of compression regression and classification problem,proposes a matrix compression method based on fixed length coding,and supports random access in fast time.On this basis,partial least squares regression and partial least squares logical regression algorithm based on compression matrix are established,which improves the extension of partial least squares method.Ability and learning ability.The details are as follows:Firstly,in order to solve the limitation of large-scale data sets in single computer applications,this paper designs a matrix compression algorithm BCSM(blocked compressed sparse Matrix)based on fixed-length encoding.This algorithm compresses all non-zero-position indexes according to matrix row order,and compresses them in blocks.At the same time,it supports random access matrices in fast time for classification and application.There are a lot of matrix operations in the regression model.Then,the compressed matrix is applied to the machine learning model,and a partial least squares algorithm based on the compressed matrix,NFPLS,is proposed.The method divides feature extraction into two steps.Firstly,the principal component analysis of the traditional partial least squares algorithm is replaced by the Boltzmann machine which is restricted by the neural network algorithm to extract the low-dimensional features that can express the original data.This improvement eliminates the limitation that the original method can not adequately express the non-linear characteristics of the data.Then,the residual calculation formula is modified to benefit the compression matrix.Canonical correlation analysis and linear regression were used to construct the learning model for the extracted features.In order to reduce the time complexity of the training-constrained Boltzmann machine model,this paper proposes an improvement of the contrast divergence algorithm in the constrained Boltzmann machine,which identifies and eliminates the redundancy calculation from two aspects: the first is to focus on the sampling process,and to reduce the upper and lower bounds of the possible conditional probability of each sampling unit by giving the upper and lower bounds.Unnecessary calculation;another point focuses on the calculation of conditional probability,by reusing historical results to speed up the calculation.This paper uses these two improvements to speed up the construction of NFPLS model.This paper selected a variety of data sets for experimental testing,including UCI standard data sets and large-scale data sets processed by word segmentation and TF-IDF.This paper chooses several sparse matrix compression algorithms and compares them with other similar algorithms in terms of compression time,compression rate and access performance.The experimental results show that the compression method BCSM proposed in this paper has general advantages in access performance and compression rate.At the same time,using the learning accuracy of several standard evaluation models,the performance of NFPLS algorithm is compared with other similar algorithms.Excellent regression and classification accuracy were obtained.
Keywords/Search Tags:partial least squares, restricted Boltzmann machine, compression and random access, classification and regression
PDF Full Text Request
Related items