Research On Privacy Protection For Crowdsourcing Database

Posted on:2017-10-10

Degree:Master

Type:Thesis

Country:China

Candidate:F Zhang

Full Text:PDF

GTID:2348330503489884

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Crowdsourcing database is a new database model which combine machines and human intelligence to solve query task, which is difficult for traditional relational database. The core idea is to public query and corresponding database as crowdsourcing tasks over Internet, which will be ultimately solved by human intelligence. However, if we do not do any processing on the database which contain private information before sending to the public Internet, it may result in leakage of private information. In research about traditional database, data anonymization has been extensively studied, and has proved its effectiveness in practical application such as data publishing. However existing anonymity algorithms cannot be simply applied crowdsourcing database. First, crowdsourcing databases are usually huge in amount and distributed stored, existing algorithms can hardly handle such a large-scale, distributed data. Second, the algorithm will cause too much loss of information, resulting in reduced quality of task completion.To improve the quality of crowdsourcing data, space partition based Two-Phase Partition algorithm can remain more information about the task by sampling technique, which increase the availability of anonymous data. In first phase, candidate dividers are the samples’ coordinates, which will be selected by the valuation function. In second phase, space partition is processed recursively based on kd-tree. To process the large-scale, distributed database efficiently, a parallel framework based on MapReduce is proposed, to implement the parallel version of Two-Phase Partition algorithm. The framework uses hash technology to redistribute the original database into similar sub databases, which will be anonymized in parallel, and then integrate them to get the whole anonymous database.Experiments show that, stand-alone Two-Phase Partition algorithm acquire 20% improvement on query accuracy compared with the existing algorithms, and it increases with the rising of the sample ratio. After implement the parallel version of Two-Phase Partition by the parallel anonymous framework, query accuracy is slightly lower than its stand-alone version, but the decrease is less than 5%, and it can achieve linear growth with data size for time cost. So the parallel anonymity strategy is suitable for solving privacy problems on large scale and distributed crowdsourcing database.

Keywords/Search Tags:

Crowdsourcing, Privacy Protection, Data Anonymization, Parallel Processing

PDF Full Text Request

Related items

1	Research On Privacy Protection Strategy For Crowdsourcing Task Data In Internet Of Things
2	Research And Implementation Of Data Anonymized Privacy Protection Method
3	The Research And Implementation Of Full-Domain Anonymization Algorithm Based On Cloud Platform
4	Research And Implementation Of Query Privacy Protection For Spatio-temporal Data Based On Spark
5	Research On Crowdsourcing Task Allocation Method For Workers’ Personalized Privacy Protection
6	Research On Trajectory Privacy Protection Technology For Data Publishing
7	Research On Privacy-preserving Anonymization Techniques For Social Network Data
8	Research On Differential Privacy Protection Methods In Spatial Crowdsourcing Environmen
9	Research On Privacy Protection Technology Of Data Publishing Based On Anonymization
10	Research On Privacy Protection Mechanism Of Mobile Crowdsourcing In Edge Cloud Environment