Research On Data Mining Of Massive Minority Cultural Resources Based On Spark

Posted on:2020-10-11

Degree:Master

Type:Thesis

Country:China

Candidate:M Lei

Full Text:PDF

GTID:2415330599461227

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In recent years,minority cultures have received more and more attention from the state and society.Especially with the rapid development of information technology,the digitalization of minority cultures has become more and more powerful,and various ethnic cultural information resources have emerged.However,in the face of a large number of cultural resources of ethnic minorities with rich content and complex structure,how to quickly and accurately discover,acquire and utilize valuable information has become one of the urgent problems to be solved in the development of minority informatization.This paper combines big data processing technology with data mining technology to study the massive data mining methods of ethnic minority cultural resources,and provides an effective way to promote the protection and inheritance of minority cultures.The main research contents of this thesis includes the following parts:(1)Pretreatment of ethnic cultural resources.The cultural resources of ethnic minorities are mainly distributed in the local websites of various nationalities in the form of texts.This paper uses the web crawler technology to crawl the data of various ethnic websites,and then preprocesses the obtained text resources,removes the HTML format in the data,and finally the plain text resources.Perform word segmentation,remove stop words,generate text vector features,and construct a text vector feature model.(2)Spark-based Particle Swarm Optimization(PSO)and k-means algorithm are parallelized.Aiming at the insufficiency of data processing efficiency in a single machine environment,the PSO and k-means algorithms are parallelized by introducing the Spark distributed computing framework.In the parallelization process of PSO algorithm,the influence of fixed weight parameters is reduced by linear parameters,and the parallel operation efficiency is further improved.(3)Parallelization of PSO-kmeans algorithm based on Spark.In order to overcome the limitation of the clustering center of k-means algorithm,a particle swarm optimization algorithm is used to quickly determine the clustering center of k-means algorithm,and a PSO-kmeans algorithm based on Spark is proposed.The algorithm is easy to fall into the local optimal defect of the particle swarm algorithm,and the linear parameter is introduced to speed up the search speed,so that the cluster center can be obtained quickly.Experiments show that the PSO-kmeans algorithm ensures the stability of the algorithm while reducing the running time,and the accuracy rate is improved by 3.4%in the clustering task of minority cultural resources.(4)Realization of data mining prototype system for massive minority cultural resources.Based on the analysis of the functions of the minority data resource mining prototype system,a minornty data mining platform was built.The platform uses B/S structure design and realizes data collection,data processing and data analysis functions.

Keywords/Search Tags:

Minority, Data mining, park parallel computing

PDF Full Text Request

Related items

1	Applying Web Data Mining To The Parallel Corpus: The Automatic Identification And Alignment Of The Corresponding Units
2	Analysis And Research To The Data Of CET-4 Score Based On Data Mining
3	The Effects Of Big Data On Movies And TV Drama
4	Data Mining Application In The Classification Of Music
5	A Research On Scoring SAQs Of Online Listening By Data Mining
6	Research And Application Of Data Mining In College English Teaching And Evaluation
7	Application Of Several Data Mining Algorithms In The Analysis Of Ancient Ceramics
8	Ethical Analysis Of Open Source Data Mining
9	A Report On The Translation Of Data Science For Business-What You Need To Know About Data Mining And Data-analytic Thinking(Chapter 3)
10	The Virtue Of Forgetting In The Age Of Ubiquitous Computing