Research On Algorithm Of Mining Frequent Itemsets From Uncertaln Data

Posted on:2013-02-08

Degree:Master

Type:Thesis

Country:China

Candidate:J M Wang

Full Text:PDF

GTID:2218330374461332

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Uncertain data is widespread in some application fields such as sensor network, RFID (radio frequent identification), Web applications and so on. The research on uncertain data mining has been become a new hotspot in the area of data mining recently. Uncertain data mining includes clustering, classification, frequent itemsets mining and outlier detection, etc, among which the frequent itemsets mining is one of the focus issues.This paper elaborates the causes of uncertainty and the data types of uncertain data, after that the research status of uncertain data mining and the classical algorithms of mining frequent itemsets from certain data (traditional data) is summarized. And then some new algorithms used to mine frequent itemsets from uncertain data (such as U-Apriori and UF-growth) and uncertain data stream (such as UF-streaming and SUF-growth) is mainly discussed. The U-Apriori and UF-growth are the expansion and improvement of the classical algorithm Apriori and FP-growth respectively, while the UF-streaming and SUF-growth are all based on the tree structure. All of them are efficient algorithms in the area of mining frequent itemsets from uncertain data. But through several analysis we found that the research on mining frequent itemsets from uncertain data mostly concentrated on the complete frequent itemsets, and there is few algorithm used to mine maximal or closet ones.A new algorithm UMF-growth used to mine maximal frequent itemsets from uncertain data is proposed in the paper. The main idea of the algorithm is detailed illustrated with an example. The UMF-growth is based on the UF-growth algorithm and only need to scan the original database twice to complete the mining of maximal frequent itemsets. Different with the UF-growth algorithm, the mining process of the UMF-growth is divided into two steps:The first step is to find out all of the local maximal frequent itemsets with the frequent1-item as suffixes, respectively. The second step is to insert all of the local maximal frequent itemsets into the UMF-Tree in a similar fashion as the construction of FP-Tree. And then we will capture all of the maximal frequent itemsets of the original database. In order to further improve the efficiency of our algorithm, an improvement strategy, namely to discretize the expected support of the UF-Tree node up to n decimal places is proposed. The experimental results show that the performance of UMF-growth is very good and especially suitable for the dense database, and the improvement strategy could effectively improve the efficiency.

Keywords/Search Tags:

uncertain data, frequent itemsets, UF-growth algorithm, maximal frequentitemsets

PDF Full Text Request

Related items

1	Research On Algorithms For Frequent Itemsets Mining Over Uncertain Data Streams
2	Research On Algorithms For Mining Maximal Frequent Itemsets
3	The Research And Implementation Of Mining Frequent Itemsets Algorithm Over Streaming Data
4	Research On Algorithm For Mining Frequent Itemsets Of Uncertain Data
5	Research On Mining Algorithm Of Maximal Frequent Itemsets
6	Research On Algorithm For Mining Maximal Frequent Itemsets Over Data Streams
7	Research On Maximum Frequent Itemsets Based On FP-Tree
8	An Algorithm And Context Analysis Of Mining Frequent Closet Itemsets
9	Research On The Algorithm Of Mining Frequent Itemsets From Uncertain Data Based On The Tree
10	An Algorithm Of Maximal Frequent Itemsets Mining Based On Dynamic Reordering