Font Size: a A A

Research And Parallel Application Of Supervised Learning Algorithms For Large-scale Data Classification Problems

Posted on:2019-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q YangFull Text:PDF
GTID:2428330590474055Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology,there have been social phenomena in which all walks of life are rapidly generating massive amounts of data in recent years.Under this premise,people gradually began to think about how to maximize and use the data most effectively.Machine Learning and Deep Learning technology have sprung up.As the most typical application method of machine learning and deep learning,classification algorithm has quickly become the research direction of the industry and academic circles.How to build a high-efficiency The problem of appropriate classification models is starting to make a big difference.In addition,the computer's single-threaded floating-point computing power has gradually reached the bottleneck.The current hardware storage technology and network services read-write and transmission capacity has also grown slowly.Such methods are far from the ability to handle such booming data scale.Then,the distributed parallel computing framework has gradually entered people's attention.It is an emerging research topic to build the most reasonable parallel computing framework for different algorithms.In order to comply with these technical requirements,this paper proposes a complete set of complete processes.The processes designs machine learning improvement models,uses high efficiency optimization methods,applies optimization algorithms to parallel computing.The main research of this paper is based on the multi-classification problem of supervised learning in machine learning.That is,a classification model is built on the training set with the class label,and the parameters of the model are fitted,and the method for predicting the label on the test set of the unknown class label is used.This paper first proposes a method of modeling the quantized value of the class label to the coordinates of k vertices in the(k-1)-dimensional European space,adding the noise reduction function and the appropriate penalty term to the model.A new linear multi-classification model was established.After the model is built,this paper cites several powerful constrained and unconstrained optimization algorithms to optimize the model.Since then,this paper also designed a parallel framework for the algorithm,which greatly improved the speed of the classifier.The research includes the following parts:(1)This paper establishes a new linear classifier model for multi-classification problems.The label setting method in the model introduces Vertex Discriminant Analysis(VDA),a label setting method that processes multiple categories and predictors exceeding the training case,so that the model has a more even distribution in the case of multiclassification problems.In addition,the model also uses the commonly used noise reduction function of ε-insensitive function,which makes the model have the most direct effect on the over-fitting phenomenon under noise data and the error of data sampling.(2)In the optimization process of the model,several simple and powerful optimization algorithms with constraints and unconstrained conditions are used in this paper.The Forward-backward splitting and FISTA algorithms are used to optimize the non-smooth function convex model in the unconstrained optimization problem,and the iterative steps are obtained.This paper also uses the ADMM algorithm to transform the model into a constrained convex model and optimize it to obtain iterative processes.In the numerical experiments,the model and optimization algorithm of this paper obtained some highprofile results.(3)Another work of this paper is to use the Divide and Conquer method to decompose the parent problem of the algorithm into sub-problems and solve the problem.The parallel optimization framework based on ADMM is designed in the mixed environment of MPI and openMP.This computing framework has achieved considerable computational performance acceleration.
Keywords/Search Tags:supervised machine learning, linear classifier, VDA, optimization algorithm, parallel computing
PDF Full Text Request
Related items