Font Size: a A A

Differential Privacy Association Rules And Application Based On Local-prefix Tree And Report One-sided Noisy Arg-max

Posted on:2024-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:R YangFull Text:PDF
GTID:2568307094484474Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Association rule mining is one of the important tasks of data mining,which can find interesting and credible co-occurrence relationships hidden in datasets.However,directly mining data containing sensitive information may lead to the disclosure of personal privacy.Therefore,how to effectively protect the privacy for association rule mining has important significance and broad application prospects.Differential privacy,as a privacy protection model that is independent of background knowledge and has strict mathematical system proof,can effectively protect personal level information and quantify the degree of privacy protection.Therefore,it is widely introduced into the association rule mining to protect the security of user data.However,existing association rule mining algorithms based on differential privacy often focus on frequent itemset generation,which is prone to discard the low or medium support rules due to the relatively large amount of noise addition,and cannot guarantee the rule search efficiency.Meanwhile,the query sensitivity or sampling probability of such methods is usually related to the size of the candidate space,and too large candidate space will lead to a rapid decrease of data utility.Furthermore,when the privacy budget is low,the Laplace perturbations on support counts can also cause large bias in mining results.In this thesis,we conduct an in-depth study on differential privacy protection of association rules using local prefix trees and reporting one-sided noisy arg-max,and the main work is as follows:(1)A differential privacy association rule mining algorithm based on local prefix trees is proposed to address the problem that existing algorithms ignore low or medium support rules.Firstly,the transaction length constraint method is used to reduce the sensitivity and the loss of order information in the items header table.Secondly,the dataset is converted into vertical format,and the local prefix-tree based on window is used to search for candidate rules,which effectively reduces the rules search space and the repeated calculation of supports;Thirdly,we extract high-quality rules using the technique that combines the exponential mechanism(EM)with reservoir sampling.Quality function is redefined in EM.In the end,the utility and efficiency of the algorithm are verified by theoretical analysis and experiments.(2)Frequent pattern mining is a key step in association rule mining.Due to the data utility is greatly affected by the candidate space in frequent pattern mining,a differential privacy frequent pattern generation algorithm based on reporting one-sided noisy arg-max(RONA)is proposed.Firstly,the RONA selects and reports top-k frequent patterns without additional information loss;Secondly,the generalized random response is used to perturb the supports of the selected patterns,which effectively controls the scale of the added noise and reduces the rule confidence errors.Finally,through theoretical analysis and experimental evaluation,the utility and efficiency of the algorithm are verified.(3)Based on the above research,a prototype system of enterprises data sharing based on differential privacy association rule mining is designed and implemented.And the system structure and key technologies are described in detail.The running results show that the system provides effective privacy protection for the government or enterprises to release associated data to subordinate units or third parties.
Keywords/Search Tags:Privacy protection, Association rule, Differential privacy, Frequent itemsets, Local prefix-tree, Exponential mechanism
PDF Full Text Request
Related items