Font Size: a A A

Research And Application Of Enterprise Big Data Classification Problem Based On Clustering Algorithm

Posted on:2023-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:C F LuoFull Text:PDF
GTID:2568306752477384Subject:Mathematics
Abstract/Summary:PDF Full Text Request
In recent years,with the continuous advancement of data science,the amount of various types of data is also increasing.How to discover the characteristics of data in different industries and how to quickly with efficiently mine data information in various industries have become one of the problems to be solved urgently.Traditional c lustering methods have limitations when dealing with enterprise big data.In solving the problem of over-fitting of big data of consumer enterprises,this research proposes a method of combining partition clustering algorithm and tree model algorithm to solve the problem of over-fitting;and in solving the problem of accuracy of big data of industrial enterprises,a method of fusing density c lustering and tree model algorithm is proposed to improve the accuracy of the results.The main research results of this thesis are as follows:(1)Aiming at the overfitting problem in the big data c lassification process of consumer enterprises,an ensemble algorithm of XGBoost base classifiers based on Kmeans clustering is proposed.The XGBoost algorithm has high accuracy,but there is an over-fitting problem.The clustering algorithm has low accuracy,but there is no over-fitting problem.Therefore,this thes is combines the advantages of the two algorithms to carry out algorithm fus ion.The proposed algorithm is applied to the big data of consumer enterprises,conducts in-depth research and verification,and compares it with existing algorithms.The experimental results show that the proposed algorithm can effectively solve the problem of over-fitting in the big data classification process of consumer enterprises.(2)Aiming at the low accuracy of big data classification results of industrial enterprises,a method based on the fusion of OPTICS clustering and XGBoost algorithm is proposed.First,a fundamental analys is of the raw data is performed.Then,cluster the data using density clustering.Finally,the clustering results are predicted and analyzed using the XGBoost algorithm.In this thesis,the proposed fusion method of density algorithm and XGBoost algorithm is applied to industrial big data,and further research and verification are carried out.The experimental results show that the proposed algorithm has a certain improvement in the accuracy of classification results on industrial big data.The problem of over-fitting and low accuracy in the process of enterprise big data classification based on clustering algorithm proposed in this thes is can effectively improve the decis ion-making ability and computational effic iency of traditional classification algorithms.It provides a novel solution for the analys is and processing of big data.
Keywords/Search Tags:Clustering algorithm, Classification problem, XGBoost algorithm, Kmeans clustering, OPTICS clustering
PDF Full Text Request
Related items