| Scientific data is an important strategic resource for national scientific and technological innovation and sustainable development.Scientific data in the field of population health,one of the most active scientific research fields,is widely used in drug research and development,epidemic situation monitoring,public health monitoring,public health monitoring,clinical experiment data analysis,drug safety and effectiveness,health economics evaluation and other aspects.Data sharing is one of the important means to give full play to the scientific value,social value and economic value of population health scientific data.Generally,scientific data is organized and shared through the key infrastructure sharing platform.However,population health science data has complex types and diverse forms.It usually contains sensitive information such as individual identification,diagnosis and treatment results,and medical expenses.It has strong privacy and professional specificity.Once it is leaked,it may bring great harm to the country,society and individuals.Therefore,the protection of sensitive information and data security become the constraints of population health science data sharing,which also puts forward higher requirements for the control of sensitive information of sharing platform.In recent years,more and more attention has been paid to the management and sharing of population health scientific data,and great progress has been made in the construction of the sharing platform.The national population health data archive(PHDA),which is certified by the Ministry of science and technology and the Ministry of finance of the people’s Republic of China,is the largest population health science and technology resource sharing service platform in China.It undertakes the task of integration,collection and sharing of scientific data in the field of national population health,and realizes the hierarchical management and safe sharing of data,It is necessary to detect and evaluate the sensitive information of the data submitted by users.Therefore,this study focuses on the protection of sensitive information in the security sharing of population health science data,combined with the construction goal of national population health science data repository PHDA,deeply studies the needs of sensitive information detection and sensitivity assessment,and aims to explore a kind of data sensitivity assessment method to provide a reference for the hierarchical data management and security sharing of population health science data sharing platform.Specifically,the main work of this study includes the following three parts.Firstly,the data sensitivity evaluation requirements analysis was carried out for PHDA,and the research content of this sensitivity assessment was clarified on this basis.At first,the data characteristics of PHD A were analyzed from data type,volume and data organization structure,and the status quo of PHDA sensitive information management and control was analyzed from the aspects of sensitive information control process and stakeholder demand investigation.On this basis,the corresponding data sensitivity assessment related requirements were summarized.Finally,the evaluation object,the content of the data sensitivity evaluation research and assessment scenario are defined,which lays the foundation for the orderly development of the follow-up work.Secondly,the data sensitivity evaluation method for the sharing needs of population health data was designed.Based on the requirements of policies and regulations such as Health Insurance Portability and Accountability Act(HIPAA)and Information security technology—Personal information security specification,this method refines the category of sensitive information in population health science data,and constructs a sensitive information recognition dictionary and rule base for the content and formal characteristics of sensitive information types.Sensitive features are identified and analyzed at the level of metadata,data item and data value.In addition,a unified sensitivity evaluation standard is set from the two aspects of identification degree and leakage loss degree.Based on that,data sensitivity is calculated.Finally,a data sensitivity evaluation report is generated for each data set to mark,describe and reveal the sensitive information in the data set,It is used to provide reference for hierarchical management and safety sharing of population health scientific data.Thirdly,demonstration and expert evaluation was conducted.This study used the real world data set in PHDA to evaluate the data sensitivity,and experts are invited to evaluate the generated sensitivity evaluation report,so as to verify the application effect of the proposed data sensitivity evaluation method,and evaluate the feasibility,scientificity,practicability and effectiveness of the method.The innovation of this research lies in the construction of a kind of data sensitivity assessment framework for population health science data sharing needs in China,which meets the needs of PHDA for data sensitivity assessment based on sensitive information identification and feature analysis.Specifically,the category of sensitive information was determined,the sensitive information identification dictionary database and rule database were constructed,and a unified population health science data sensitivity evaluation standard was set.The theoretical value of this study is to provide theoretical support for data security sharing and sensitive information protection research,and the application value is to identify sensitive information in PHDA population health science data,and to provide a reference for data managers to evaluate data sensitivity. |