Research On Top-k High Fuzzy Utility Itemsets Mining Algorithm

Posted on:2024-06-18

Degree:Master

Type:Thesis

Country:China

Candidate:W Zhou

Full Text:PDF

GTID:2568307157999529

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

High fuzzy utility itemset mining algorithm is a hot issue in the field of data mining.Compared with traditional frequent itemset mining that only provides frequent itemset information,the mining result of high fuzzy utility itemsets mining algorithm contains the utility information of itemsets,which can provide users with more useful information.In practice,users want not only utility information about the itemsets in the mining results,but also the quantity of each item in the itemset for better decisionmaking.The emergence of high fuzzy utility itemset mining algorithms meets this need.The same as high utility itemset mining algorithm,the existing high fuzzy utility itemset mining algorithm also has the problem of threshold selection.If the fuzzy utility threshold is set too low,too many high fuzzy itemsets will be gernarated.If the fuzzy utility threshold is set too high,it is likely that no any high fuzzy itemsets will be generated.In both cases,it is difficult for users to find interesting itemsets.In addition,the existing high fuzzy utility itemset mining algorithms still have the problems of too many candidate itemsets and too large search space.It is urgent to improve the data structure and propose appropriate pruning strategies.In response to the above problems,this paper has done the following work:(1)Drawing on the idea of traditional top-k efficient itemset mining,the problem of threshold selection in itemset mining can be transformed into the problem of the k itemsets that user expects to mine,and the concept of top-k high fuzzy utility itemset mining is proposed.Two top-k high fuzzy utility itemsets mining algorithms are proposed,namely TKHFU-DPL and ITKHFU-DPL.(2)Top-k High Fuzzy Utility itemset mining algorithm based on Double Pruning and List structure,namely TKHFU-DPL is proposed.The algorithm doesn’t need to set a threshold in advance,but straightly mines out the top-k itemsets with the highest fuzzy utility according to the user’s need.An effective compressed data structure and fuzzy itemset utility list are proposed to save the potential top-k itemsets and their utility information generated during mining,which avoids complex connection operations when high-level itemsets are generated from low-level itemsets.Two effective pruning strategies are proposed and applied to the list structure,which reduces the generation of hopeless itemsets.Experimental results show that the proposed algorithm outperforms the latest algorithm in terms of running time,memory consumption and scalability.(3)An improved TKHFUL-DPL algorithm(ITKHFUL-DPL)is proposed.The algorithm solves the problem that threshold raising speed is too slow in the TKHFUDPL algorithm.The algorithm uses two effective threshold raising strategies to raise the threshold quickly before formal mining phase,which reduces a large number of meaningless itemsets.A compressed data structure EFUCS is used,specifically designed to hold the fuzzy utility of 2-itemset.An effective pruning strategy,TWFUPURUNE,has been added,which,in conjunction with EFUCS,further pruned hopeless itemsets and avoided the problem of consuming a lot of memory due to generating meaningless list structures.Experiments based on two real datasets and a set of synthetic datasets show that the improved algorithm proposed in this chapter is superior to the TKHFU-DPL algorithm in terms of running time,memory occupation and scalability.

Keywords/Search Tags:

data mining, itemset mining, high fuzzy utility itemset, fuzzy itemset utility list, pruning strategy, threshold raising strategy

PDF Full Text Request

Related items

1	Algorithmic Research For Mining High Average Fuzzy Utility Itemset With Multiple Minimum Utility Thresholds
2	Research On Novel Methods In Utility Pattern Mining
3	Research On High Utility Pattern Mining Methods In Data Stream
4	Research On Key Technologies Of High Utility Itemset Mining
5	Research On High Utility Pattern Mining Technology
6	Research On Privacy Preserving Approaches For Frequent Itemset Mining And High-Utility Itemset Mining
7	Research On GPU-Based Heuristic Algorithm For Mining High Utility Itemset
8	Research On Algorithm For Mining High Utility Itemset With Negative Item Values
9	Research On Frequent And Closed High Utility Itemset Mining Algorithm Based On Spark
10	Algorithmic Research For Mining Unexpected High Utility Itemsets In Dynamic Environments