| With the rapid development of digitization, a lot of information is generated every day in the world. A large part of the information contains sensitive data, for example patients’ information from hospitals and so on. This information has great scientific value, for example the prevalence of the disease can be inferred from various local hospitals’ information, but publishing this information directly will leak privacy. So how to use these valuable information without leaking the sensitive information is a challenging work. Differential privacy is a mechanism with powerful privacy protect function. It works by injecting random noise into each original data, it is provably hard for the adversary to infer the presence or absence of any individual record from the published noisy results. So far differential privacy has been applied on many fields, such as histogram publication, batch query, top-k data mining, linear regression, classify and so on.Because differential privacy protect privacy through adding noise into the original data, the validity of the data is affected. The main objective in differential privacy is to maximize the accuracy. Histogram publishing technology is widely used in the statistics, so it has become the hotspot of the differential privacy’s research. The main optimization method in histogram publication is merging the original data at the first step, and then add the noise into the merging field. However, the noise reduction is not very good with some defects in the previous algorithms.This thesis does a deep research on histogram publication based on differential privacy, and analyzes different kinds of algorithms published in recent years. This thesis also finds the main reasons for the defects of the algorithms, and then proposes a algorithm to improve the result. The existing histogram publication algorithm is divided into two parts, grouping part and add-noise part. This paper mainly focuses on improving grouping part algorithm.Unlike previous works which group the data at one time. In this paper, the group algorithm is divided into two steps. The first step is to divide the whole data into two parts, the second step is to further divide the two sub-sections of the first step. The advantage of this classification is that it can effectively reduce the sensitivity of the algorithm, so it can improve the quality of the division. Finally, the result is optimized. In this paper, a large number of experiments show that the proposed algorithm can further reduce the error compared with the existing optimization algorithms. |