| The census is a nationwide collection of population information and an important method for collecting population data in countries around the world.The seventh national census has many new changes in the methods and ways of the census,such as Internet self-filling,electronic equipment collection,real-time data upload,administrative record comparison,and the use of big data technology combined with modern information management technology,etc.In this context,the use of big data technology to assist the census and improve the efficiency of the census is currently a matter of concern.As an engineering master and statistician,in order to help the seventh national census,this thesis aims to solve the big data application problems encountered in the census work,and uses Top-k query technology to query and analyze the census big data.On this basis,we researched and implemented the census big data application system.The main content of the thesis is as follows:(1)The application of census big data is explored.This thesis introduces the composition of the census big data,and analyzes the characteristics of the data sources such as the massive household electricity consumption data,the encrypted data of administrative records,and the mobile phone signaling data.Due to the distributed storage system of census big data,the traditional small-scale centralized query method can no longer be used directly.Meanwhile,according to the characteristics of census big data,three important query requirements of mass distribution,privacy protection and uncertainty are proposed.(2)The Top-k query method for census big data is studied.Aiming at the demand of massive data query,the thesis studies the data partition technology to reduce the amount of data query,and realizes the distributed Top-k query algorithm.In view of privacy protection query demand,data filtering and homomorphic encryption technology are studied to realize Top-k query algorithm for vertically distributed encrypted data.Aiming at the demand of uncertain query,the global sort and increment methods are adopted to realize the efficient uncertain Top-k query algorithm.(3)The application system of census big data is designed and implemented.Combined with the demand of empty household judgment,family situation analysis,flow analysis and other problems,the overall design of the census big data application system is carried out,and the relevant functional modules are designed implemented.Data from public security departments,health departments and power supply departments are used for cross-verification with mobile signaling data to assist the analysis and decision-making of the census and improve the scientific nature and accuracy of the census. |