Font Size: a A A

Research On Improvement Of K-means Clustering Algorithm Based On Differential Privacy

Posted on:2022-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:Q ChengFull Text:PDF
GTID:2518306536954709Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Data mining can discover the inherent interrelationships between data.The K-means algorithm is simple,fast,easy to implement,and efficient in processing massive data,and is widely used in the field of data mining.However,the use of K-means algorithm for data mining may leak the user’s personal privacy,so the combination of privacy protection technology and K-means algorithm,data mining on the premise of protecting data privacy and security has become one of the research hotspots.The differential privacy model has a strict definition of privacy protection.Therefore,applying differential privacy to data mining can resist some problems that traditional privacy protection technologies are difficult to solve in the field of data mining,such as background knowledge attacks and consistency attacks.Differential privacy provides privacy protection for data while maintaining certain statistical characteristics of data,so it is very suitable for data mining to provide privacy protection for data.The traditional differential privacy K-means algorithm randomly selects the initial center points,which may result in poor clustering effect and low data availability.In the existing research,the unreasonable privacy budget allocation scheme will also affect the clustering effect.The main contents of this paper are as follows:(1)In order to solve the problem that the differential privacy K-means clustering algorithm is sensitive to the selection of the initial center points,the distance combined with the sum of squares of errors(SSE)in the cluster we called DAS method is proposed to provide a reasonable initial center points for the differential privacy k-means algorithm.The DAS method first takes a value multiple of k as the initial center points,and first performs a DPK-means clustering,selects the point with the smallest SSE in the cluster as the first initial center point,calculate the distance between the remaining center points and the initial center points and sort them,and select the point farther from the initial center points and with the smaller SSE as the next initial center point,until k initial center points are selected.(2)This paper analyzes the irrationality of the existing privacy budget εallocation method of differential privacy K-means clustering algorithm,and proposes a privacy budget allocation method we called AST based on arithmetic sequence and trisection method,which makes the differential privacy k-means algorithm obtain more privacy budget allocation in the early iteration process,while the privacy budget allocated in the later iteration will not cause the deformation of the centroid.(3)This paper proposes a differential privacy DTDPK-means clustering algorithm based on DAS method and AST method,and compares it with the existing differential privacy K-means clustering algorithm on UCI data set.The results show that DTDPK-means clustering algorithm can improve the usability of data clustering.The differential privacy DTDPK-means algorithm is applied to the recommendation system to protect the privacy of users.The experimental results show that the algorithm will not affect the accuracy of the recommendation results under the premise of protecting the privacy of users.
Keywords/Search Tags:Data mining, K-means, Differential privacy, Privacy budget
PDF Full Text Request
Related items