Research On Eclat Algorithm Based On Flink Platform And Its Application In EMU Fault Association Mining

Posted on:2020-12-08

Degree:Master

Type:Thesis

Country:China

Candidate:T X He

Full Text:PDF

GTID:2392330578452427

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the advent of the era of big data,distributed computing engine platform has attracted more and more attention.Apache Flink is a memory-based distributed computing engine platform that fully supports stream processing.It regards batch processing as a limit case of stream processing,and uses the coneept of stream processing to solve batch processing,which provides a new idea and method for data analysis.The traditional association rules algorithms Apriori,FP-Growth and Eclat have some limitations.Choosing an appropriate association rule mining algorithm and improving it is one of the research focuses of this paper.EMU has accunulated a lot of data in daily operation and maintenance.How to acquire knowledge from these data to guide the operation and maintenance of EMUs and improve their reliability has become an urgent problem to be solved.This thesis improves Eclat algorithm on Flink platform and applies the improved algorithm to EMU fault Association mining.The main work includes:(1)A decision strategy based on the comparison of specific elements is proposed to quickly judge whether the intersection operation can get frequent items.By adding this criterion to Eclat algorithm,the intersection operation of frequent items can be skipped,the number of iterations can be reduced,and the efficiency of the algorithm can be improved.Compile the improved algorithm program before and after the improvement,and process the open data sets in Flink local execution environment to do comparative experiments to verify the effectiveness of the improved method.(2)A data preprocessing method-field digitization,is proposed to convert complex text into simple positive integer in EMU data and record this one-to-one mapping relationship.After field digitization of EMU data,different types of fields correspond to different continuous intervals,so field types can be filtered by simple numerical comparison.The digitization of data sets not only reduces the memory consumption in the process of calculation,but also improves the computational efficiency of the algorithm.(3)A filtering strategy based on field digitization and research purposes is proposed to filter out frequent items that do not contain fault information.By optimizing frequent itemsets,this strategy reduces the iteration radix of intersection operations and improves the efficiency of the algorithm.The validity of the improved method is verified by comparing the pre-processed EMU data.(4)Flink on YARN mode cluster is deployed to provide environment support for parallel processing of large-scale EMU data sets.Flink has a concept of parallelism,which can be achieved by setting the value of parallelism greater than 1.Adjust the parallelism and repeat experiments to explore the relationship between the parallelism and the computing efficiency of the platform.Compiling Map function and Reduce function under MapReduce platform to compare the computational efficiency of the two platforms under the same conditions.

Keywords/Search Tags:

Flink platform, Data mining, Assocation rule mining, Eclat algorithm, EMU

PDF Full Text Request

Related items

1	The Research Of Quality Analysis And Evaluation Of Tracks Based On Association Rule Algorithm
2	Research And Implementation Of Key Technologies In The Analysis Of The Relationship Between Faults In Large Data Sets Of EMU
3	Data Mining Based Alarm Correlation Analysis In EMU
4	Research On The Fault Association Analysis For High-speed EMU
5	Application Research Of Data Mining Platform For Wuhan Metro Rail Transit Based On Big Data
6	Study On The Data Mining Technology Of Remote Sensing And Unmanned Aerial Vehicle Low Altitude Remote Sensing Image Based On Distributed Systems
7	Research On Production Scheduling Rule Extraction And Application System Based On Data Mining Technology
8	Research On Big Data Mining Method For Crane Safety Evaluation
9	Analysis Of Road Traffic Accidents Based On Data Mining Approach Of Association Rules
10	Research On Development And Evolution Of Port Group Based On Data Mining