| In recent years,with the rapid development of new technologies such as 5G,cloud computing,big data and artificial intelligence,the State Council has attached great importance to the sharing and integration of government information in my country.Different methods and different storage formats make it difficult to uniformly store and manage multi-source heterogeneous data using traditional databases or data warehouses as the shared exchange platform of the data center,which makes the maintenance and expansion of the later system more difficult.The sensitivity of Chinese government data also needs to be paid attention to.Once sensitive information is leaked,it will have a great impact on society or individuals,and due to the inconsistency of data formats and data duplication caused by multi-source heterogeneous data,the overall The decline in the quality of government data affects the efficiency of information sharing and exchange.How to solve the above problems has become a new challenge for the sharing and exchange system.Aiming at the problem of unified storage and management of multi-source heterogeneous data,this paper uses a data lake to establish a data center,which can flexibly deal with the unified storage and management of multi-source heterogeneous data,and can easily store and manage data with the continuous growth of data volume.Center for expansion.Aiming at the security of government data,this paper designs a dynamic data desensitization process model after studying the related technologies of data desensitization,so that the data can be desensitized according to the needs of the department before the data is shared and exchanged,designed the evaluation criteria of the desensitization algorithm and desensitization algorithms of several types of sensitive data,and designed the design principles of the desensitization algorithm for government sensitive data of numerical type.Aiming at the inconsistency of data format caused by multi-source data,this paper provides the automatic generation of regular expression detection based on information entropy,and performs unified processing of data format according to the detection results.Deduplication method,which applies the Sim Hash to the deduplication scene of structured data,which improves the deduplication efficiency of massive structured data.Based on the above methods,this paper designs and implements a government affairs data sharing and exchange system.First,it analyzes the system requirements in detail,and divides the system functions according to the three modules of data source management,data management and data processing.The architecture design and the coding implementation of the modules are carried out,and finally,appropriate test cases are designed to verify the functional and non-functional requirements of the system. |