Research On Frequent Itemset Mining Method Based On Dynamic Groupin

Posted on:2024-07-30

Degree:Master

Type:Thesis

Country:China

Candidate:W H Xu

Full Text:PDF

GTID:2568306920475064

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The wide applicability of data mining in various fields of society has aroused great concern among relevant researchers.Frequent itemset mining is one of the very popular data mining techniques in the current era,and it plays a vital role in data mining topics.The items and their combinations that meet the support threshold in the big data text are mined to provide data support for various mining tasks while taking into account the accuracy and performance of mining.The classic Apriori algorithm,FP-group algorithm,frequent itemset mining algorithm based on bit combination,and frequent itemset mining algorithm based on linear table proposed by previous researchers are analyzed in this paper.By comparing the advantages and disadvantages of these algorithms,a frequent itemset mining algorithm based on dynamic grouping(DGFIM)is proposed,which uses a novel MPL(multi-partition list)data structure.The MPL is composed of arrays,which solves the limitations brought by maintaining a large number of pointers in the traditional tree structure.In addition,The MPL stores the minimum effective information required in the mining process and supports linear access.The frequent itemset mining algorithm based on dynamic grouping uses the coverage formula to dynamically group MPL according to the different characteristics of the dataset.According to the most significant bit information of the binary bits corresponding to the candidate itemset,the position of the MPL to be mined is judged,and at the same time,the length of the MPL to be traversed is judged by the low-level feature information of the binary bits.In addition,according to whether the linear table structure is used when constructing MPL,that is,MPL＿LT(MPL with linear table)and MPL＿NLT(MPL without linear table).The linear table frequent itemset mining algorithm based on dynamic grouping(DGFIM-LT)and the frequent itemset mining algorithm that directly constructs MPL＿NLT(DGFIM-NLT)is discussed separately in this paper.The linear table frequent itemset mining algorithm based on dynamic grouping generates a highly shared linear table in the process of constructing MPL＿LT.The linear table saves all transaction information in the original dataset and its binary conversion form after filtering and sorting in descending order.A lower bit in binary indicates higher support for the item represented by that bit.The MPL＿LT constructed by the linear table is more compact,and the node chain of each MPL＿LT is also kept in order,which provides great convenience for mining.However,the linear table does not play any substantive role when mining,but occupies a large amount of memory space.Therefore,a frequent itemset mining algorithm that directly constructs MPL＿NLT is proposed in this paper.The preprocessed original dataset is mapped to MPL＿NLT one by one,and the upstream operation is performed on it to obtain the final MPL＿NLT.Compared with the construction of MPL＿LT,this process saves the memory overhead caused by generating linear tables.However,the upstream process involves bottom-up iterative operations,which affects mining performance to a certain extent.In addition,two strategies are proposed to optimize the algorithm,namely group pruning strategy and itemset ascending order mining.Using these two optimization strategies,the candidate itemsets will be filtered and eliminated in time,which further reduces the search space.The algorithm proposed in this paper is tested on a variety of public datasets.The results show that the DGFIM-LT algorithm has a very good running speed,especially when the support threshold is small,the effect will be more obvious.Followed by the DGFIM-NLT algorithm.Among them,the operating efficiency of DGFIM-LT algorithm is 11% higher than that of FMLT algorithm on average,29% higher than that of BCLT-O algorithm,and 2-3 times higher than that of FP-growth.However,on some datasets,the DGFIM-NTL algorithm has a similar running speed to the FMLT algorithm,BCLT-O algorithm,and FP-growth algorithm.In terms of memory usage,the DGFIM-NLT algorithm has a better effect than the DGFIM-LT algorithm.The above two algorithms are obviously better than the FMLT algorithm and the FP-growth algorithm,comparable to the BCLT-O algorithm,but there is still a certain gap compared with the bit combination algorithm.However,as the pace of technology update iterations gets faster and faster,it is very feasible to exchange a small amount of space for an incremental increase in speed.

Keywords/Search Tags:

frequent itemset mining, MPL, dynamic grouping, coverage, upstream operation

PDF Full Text Request

Related items

1	The Research And Application Of Association Rules Mining Algorithms Based On Directed Itemset Graph
2	Research On Mining Frequent Itemsets Algorithm Based On Bittable
3	The Research On IDS Based On Mining Max Frequent Itemset Using Big Step Pruning Strategy
4	Multi-Relational Frequent Pattern Mining Algorithm And Its Application Research
5	Research On Frequent Itemset Mining Algorithm And Its Parallelization Based On Spark
6	Research On Intersection Cache Based On Frequent Itemset Mining In Search Engines
7	Research On Frequent Itemset Mining Based On Differentially Private Model
8	Research On Privacy Preserving Approaches For Frequent Itemset Mining And High-Utility Itemset Mining
9	Research And Realization Of Parallel Algorithm For Mining Frequent Closed Itemsets
10	An Algorithm For Mining Frequent Itemsets From Data Streams