| Crowdsourcing database is a new database model which combine machines and human intelligence to solve query task, which is difficult for traditional relational database. The core idea is to public query and corresponding database as crowdsourcing tasks over Internet, which will be ultimately solved by human intelligence. However, if we do not do any processing on the database which contain private information before sending to the public Internet, it may result in leakage of private information. In research about traditional database, data anonymization has been extensively studied, and has proved its effectiveness in practical application such as data publishing. However existing anonymity algorithms cannot be simply applied crowdsourcing database. First, crowdsourcing databases are usually huge in amount and distributed stored, existing algorithms can hardly handle such a large-scale, distributed data. Second, the algorithm will cause too much loss of information, resulting in reduced quality of task completion.To improve the quality of crowdsourcing data, space partition based Two-Phase Partition algorithm can remain more information about the task by sampling technique, which increase the availability of anonymous data. In first phase, candidate dividers are the samples’ coordinates, which will be selected by the valuation function. In second phase, space partition is processed recursively based on kd-tree. To process the large-scale, distributed database efficiently, a parallel framework based on MapReduce is proposed, to implement the parallel version of Two-Phase Partition algorithm. The framework uses hash technology to redistribute the original database into similar sub databases, which will be anonymized in parallel, and then integrate them to get the whole anonymous database.Experiments show that, stand-alone Two-Phase Partition algorithm acquire 20% improvement on query accuracy compared with the existing algorithms, and it increases with the rising of the sample ratio. After implement the parallel version of Two-Phase Partition by the parallel anonymous framework, query accuracy is slightly lower than its stand-alone version, but the decrease is less than 5%, and it can achieve linear growth with data size for time cost. So the parallel anonymity strategy is suitable for solving privacy problems on large scale and distributed crowdsourcing database. |