Font Size: a A A

Research On Data Mining Of Massive Minority Cultural Resources Based On Spark

Posted on:2020-10-11Degree:MasterType:Thesis
Country:ChinaCandidate:M LeiFull Text:PDF
GTID:2415330599461227Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,minority cultures have received more and more attention from the state and society.Especially with the rapid development of information technology,the digitalization of minority cultures has become more and more powerful,and various ethnic cultural information resources have emerged.However,in the face of a large number of cultural resources of ethnic minorities with rich content and complex structure,how to quickly and accurately discover,acquire and utilize valuable information has become one of the urgent problems to be solved in the development of minority informatization.This paper combines big data processing technology with data mining technology to study the massive data mining methods of ethnic minority cultural resources,and provides an effective way to promote the protection and inheritance of minority cultures.The main research contents of this thesis includes the following parts:(1)Pretreatment of ethnic cultural resources.The cultural resources of ethnic minorities are mainly distributed in the local websites of various nationalities in the form of texts.This paper uses the web crawler technology to crawl the data of various ethnic websites,and then preprocesses the obtained text resources,removes the HTML format in the data,and finally the plain text resources.Perform word segmentation,remove stop words,generate text vector features,and construct a text vector feature model.(2)Spark-based Particle Swarm Optimization(PSO)and k-means algorithm are parallelized.Aiming at the insufficiency of data processing efficiency in a single machine environment,the PSO and k-means algorithms are parallelized by introducing the Spark distributed computing framework.In the parallelization process of PSO algorithm,the influence of fixed weight parameters is reduced by linear parameters,and the parallel operation efficiency is further improved.(3)Parallelization of PSO-kmeans algorithm based on Spark.In order to overcome the limitation of the clustering center of k-means algorithm,a particle swarm optimization algorithm is used to quickly determine the clustering center of k-means algorithm,and a PSO-kmeans algorithm based on Spark is proposed.The algorithm is easy to fall into the local optimal defect of the particle swarm algorithm,and the linear parameter is introduced to speed up the search speed,so that the cluster center can be obtained quickly.Experiments show that the PSO-kmeans algorithm ensures the stability of the algorithm while reducing the running time,and the accuracy rate is improved by 3.4%in the clustering task of minority cultural resources.(4)Realization of data mining prototype system for massive minority cultural resources.Based on the analysis of the functions of the minority data resource mining prototype system,a minornty data mining platform was built.The platform uses B/S structure design and realizes data collection,data processing and data analysis functions.
Keywords/Search Tags:Minority, Data mining, park parallel computing
PDF Full Text Request
Related items