| High fuzzy utility itemset mining algorithm is a hot issue in the field of data mining.Compared with traditional frequent itemset mining that only provides frequent itemset information,the mining result of high fuzzy utility itemsets mining algorithm contains the utility information of itemsets,which can provide users with more useful information.In practice,users want not only utility information about the itemsets in the mining results,but also the quantity of each item in the itemset for better decisionmaking.The emergence of high fuzzy utility itemset mining algorithms meets this need.The same as high utility itemset mining algorithm,the existing high fuzzy utility itemset mining algorithm also has the problem of threshold selection.If the fuzzy utility threshold is set too low,too many high fuzzy itemsets will be gernarated.If the fuzzy utility threshold is set too high,it is likely that no any high fuzzy itemsets will be generated.In both cases,it is difficult for users to find interesting itemsets.In addition,the existing high fuzzy utility itemset mining algorithms still have the problems of too many candidate itemsets and too large search space.It is urgent to improve the data structure and propose appropriate pruning strategies.In response to the above problems,this paper has done the following work:(1)Drawing on the idea of traditional top-k efficient itemset mining,the problem of threshold selection in itemset mining can be transformed into the problem of the k itemsets that user expects to mine,and the concept of top-k high fuzzy utility itemset mining is proposed.Two top-k high fuzzy utility itemsets mining algorithms are proposed,namely TKHFU-DPL and ITKHFU-DPL.(2)Top-k High Fuzzy Utility itemset mining algorithm based on Double Pruning and List structure,namely TKHFU-DPL is proposed.The algorithm doesn’t need to set a threshold in advance,but straightly mines out the top-k itemsets with the highest fuzzy utility according to the user’s need.An effective compressed data structure and fuzzy itemset utility list are proposed to save the potential top-k itemsets and their utility information generated during mining,which avoids complex connection operations when high-level itemsets are generated from low-level itemsets.Two effective pruning strategies are proposed and applied to the list structure,which reduces the generation of hopeless itemsets.Experimental results show that the proposed algorithm outperforms the latest algorithm in terms of running time,memory consumption and scalability.(3)An improved TKHFUL-DPL algorithm(ITKHFUL-DPL)is proposed.The algorithm solves the problem that threshold raising speed is too slow in the TKHFUDPL algorithm.The algorithm uses two effective threshold raising strategies to raise the threshold quickly before formal mining phase,which reduces a large number of meaningless itemsets.A compressed data structure EFUCS is used,specifically designed to hold the fuzzy utility of 2-itemset.An effective pruning strategy,TWFUPURUNE,has been added,which,in conjunction with EFUCS,further pruned hopeless itemsets and avoided the problem of consuming a lot of memory due to generating meaningless list structures.Experiments based on two real datasets and a set of synthetic datasets show that the improved algorithm proposed in this chapter is superior to the TKHFU-DPL algorithm in terms of running time,memory occupation and scalability. |