| The wide applicability of data mining in various fields of society has aroused great concern among relevant researchers.Frequent itemset mining is one of the very popular data mining techniques in the current era,and it plays a vital role in data mining topics.The items and their combinations that meet the support threshold in the big data text are mined to provide data support for various mining tasks while taking into account the accuracy and performance of mining.The classic Apriori algorithm,FP-group algorithm,frequent itemset mining algorithm based on bit combination,and frequent itemset mining algorithm based on linear table proposed by previous researchers are analyzed in this paper.By comparing the advantages and disadvantages of these algorithms,a frequent itemset mining algorithm based on dynamic grouping(DGFIM)is proposed,which uses a novel MPL(multi-partition list)data structure.The MPL is composed of arrays,which solves the limitations brought by maintaining a large number of pointers in the traditional tree structure.In addition,The MPL stores the minimum effective information required in the mining process and supports linear access.The frequent itemset mining algorithm based on dynamic grouping uses the coverage formula to dynamically group MPL according to the different characteristics of the dataset.According to the most significant bit information of the binary bits corresponding to the candidate itemset,the position of the MPL to be mined is judged,and at the same time,the length of the MPL to be traversed is judged by the low-level feature information of the binary bits.In addition,according to whether the linear table structure is used when constructing MPL,that is,MPL_LT(MPL with linear table)and MPL_NLT(MPL without linear table).The linear table frequent itemset mining algorithm based on dynamic grouping(DGFIM-LT)and the frequent itemset mining algorithm that directly constructs MPL_NLT(DGFIM-NLT)is discussed separately in this paper.The linear table frequent itemset mining algorithm based on dynamic grouping generates a highly shared linear table in the process of constructing MPL_LT.The linear table saves all transaction information in the original dataset and its binary conversion form after filtering and sorting in descending order.A lower bit in binary indicates higher support for the item represented by that bit.The MPL_LT constructed by the linear table is more compact,and the node chain of each MPL_LT is also kept in order,which provides great convenience for mining.However,the linear table does not play any substantive role when mining,but occupies a large amount of memory space.Therefore,a frequent itemset mining algorithm that directly constructs MPL_NLT is proposed in this paper.The preprocessed original dataset is mapped to MPL_NLT one by one,and the upstream operation is performed on it to obtain the final MPL_NLT.Compared with the construction of MPL_LT,this process saves the memory overhead caused by generating linear tables.However,the upstream process involves bottom-up iterative operations,which affects mining performance to a certain extent.In addition,two strategies are proposed to optimize the algorithm,namely group pruning strategy and itemset ascending order mining.Using these two optimization strategies,the candidate itemsets will be filtered and eliminated in time,which further reduces the search space.The algorithm proposed in this paper is tested on a variety of public datasets.The results show that the DGFIM-LT algorithm has a very good running speed,especially when the support threshold is small,the effect will be more obvious.Followed by the DGFIM-NLT algorithm.Among them,the operating efficiency of DGFIM-LT algorithm is 11% higher than that of FMLT algorithm on average,29% higher than that of BCLT-O algorithm,and 2-3 times higher than that of FP-growth.However,on some datasets,the DGFIM-NTL algorithm has a similar running speed to the FMLT algorithm,BCLT-O algorithm,and FP-growth algorithm.In terms of memory usage,the DGFIM-NLT algorithm has a better effect than the DGFIM-LT algorithm.The above two algorithms are obviously better than the FMLT algorithm and the FP-growth algorithm,comparable to the BCLT-O algorithm,but there is still a certain gap compared with the bit combination algorithm.However,as the pace of technology update iterations gets faster and faster,it is very feasible to exchange a small amount of space for an incremental increase in speed. |