| Pattern mining is an important research field in data mining,High Utility Itemset Mining(HUIM)is popular in pattern mining problem.Exact algorithms are first proposed in HUIM to mine all the itemsets in transaction datasets.However,the performance of these exact algorithms decrease sharply as the number of transactions and distinct items increases due to the enumerate of large non-existent itemsets.Evolutionary Algorithms(EAs)are used in existing literatures to solve the performance bottleneck of exact algorithms.However,the EAs need to scan the dataset many times in iterative steps,which leads to serious time consuming.In recent years,Graphic Processing Unit(GPU)has been widely used to greatly improve the running speed of algorithms in parallel.GPU is more suitable for handling large-scale parallel tasks than CPU.GPU and evolutionary algorithm are combined in our works to mine HUIM problems.In this paper,evolution and heuristic strategies based on GPU platform are designed to run in a shorter time while keep the good mining quality for different dataset scales.First,in small scale datasets,the performance of existing EAs is better than traditional exact algorithms when mining high-utility itemsets(HUIs).However,EAs is time consuming in iterative steps,the running performance is worse than CPU parallel exact algorithms.In order to improve the running speed of EAs and maintain good mining quality,a parallel genetic algorithm(GA)based on the platform of GPU to mining HUIM(PHUI-GA)is proposed in this paper.The selection,crossover,mutation and fitness evaluation step of GA are designed on GPU to greatly improve the running performance.An improved initialization strategy,a sorted promising encoding vector(SPEV)strategy,and an elite strategy with population diversity are presented to improve the convergence performance of GA.Experiments show that PHUI-GA is faster than the existing EAs in same iterative steps.Compared with the algorithm before parallel,the fitness evaluation step can be accelerated up to 250 times,and PHUI-GA is up to25 times faster than existing EAs.When mining 90% HUIs,the running speed of PHUI-GA is up to 381 times faster than the existing EAs and up to 47 times faster than CPU parallel exact algorithm.Thanks to the improved strategies proposed in PHUI-GA,the algorithm maintains the best mining quality in more than half of the datasets.Second,existing EAs have good mining quality in small scale datasets.However,the scale of these datasets is limited,the number of items is only tens to hundreds,the number of transactions is thousands to millions,and the transaction density is up to 50%.However,as the size of dataset increased,the running speed of the existing EAs decreases sharply especially for large sparse transaction datasets with the number of items ranging from thousands to tens of thousands,the number of transactions up to millions,and the density is lower than 0.5%.Meanwhile,the mining quality of existing EAs decrease due to the lack of useful mining strategies for large datasets.At present,the best-performance exact algorithms have shown good performance on large sparse datasets.However,when the minimum threshold decreases,the running performance of these exact algorithm decreased sharply due to enumerate large itemsets.Therefore,a parallel heuristic algorithm for mining high utility itemsets for large datasets(PHA-HUIM)is proposed in this paper to further improve the parallel performance of GPU.Inspired by the EAs,PHA-HUIM simplified the iterative steps and designed it into three steps,which are search strategy,fitness evaluation step and population communication step.The three steps are run on GPU without frequent CPU/GPU interaction in iterative steps.The redundant structures in existing EAs and ineffective strategies are removed.In the search strategy,three search spaces are used to improve the population diversity,while the unbalanced strategy is used to find HUIs more quickly with a local search.The load balancing strategy is proposed in the fitness evaluation step to further improve GPU parallel performance.Ring topology is used to retain high utility individuals for the next iteration.Experiments show that the mining quality of PHA-HUIM is significantly better than existing EAs in large sparse datasets.To the best of our knowledge,when mining at least 50% of the HUIs,PHA-HUIM is faster than all the best-performing HUIM algorithms,including EAs,exact algorithms,and a CPU parallel exact algorithm.The speed up performance of PHA-HUIM is significant improved when mining a very large dataset. |