Font Size: a A A

Study On Metadata Heterogeneity And Its Standardized Application Supported By Biomedical Ontology

Posted on:2020-12-20Degree:MasterType:Thesis
Country:ChinaCandidate:L L ZhangFull Text:PDF
GTID:2370330578483640Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Background:Data has become an important driving force of biomedical development.The key to realize data-to-knowledge transformation is to strengthen machine readability.The use of common data element(CDE)is an important means to improve the machine's understandability for metadata.With the growth of shared data in the biomedical field,the number of data elements stored in open database is also increased rapidly.It is of great significance to study how to use the common data element effectively to promote data integration and sharing.Methods:On one hand,we established a CDE representation model with semantic support by the help of ISO/IEC 11179 standard and constructed a reusable common data element database based on this model.In this part of the study,the data entries in the database are preliminarily determined according to the National physique and Health Database,and the CDE sets was constructed by reusing CDEs in caDSR and newly building.Then all CDEs were transformed into OWL format.With the help of semantic network tool,the quality inspection of CDEs is realized.Finally,the graph database is used to store the flies,and the complex query function of SPARQL is provided.On the other hand,the heterogeneity of metadata in biomedical field was studied,and we established a prediction model of compatibility between two related metadata.First,we selected data elements in epidemic investigation from public database NCI caDSR,and extracted the essential components of the data elements according to CDE representation model.Second,xwe calculated the similarity between components of each teo data elements with the support of NCIT(National Cancer Institute Thesaurus)using ontology-based semantic similarity calculation method.Finally,a prediction model of the compatibility between data elements was built by using of support vector machine(SVM)model based on the semantics similarity between CDE components.Results:In this study,we first built the representation model of data elements,which based on the ISO/IEC 11179 metadata standard.The semantic standardization method using ontologies was specified in this model and relationships between essential components of CDE were also defined.An unique identifier is needed for every CDE and the OWL format was used to represent the final file.Data elements from National physique and Health Database was stored in graph database and can be retrieved based on the representation model.The results show that heterogeneity between common data elements are apparent in the definition of metadata in caDSR database,especially in the conceptual domain.So,a SVM model was built to predict the interoperability between CDEs.After parameter optimization,the total accuracy can up to 81.67%in three category classification.Conclusions:In this study,a common representation model for data elements which meets the FAIR criterion is established.And we built a referenced common data element database using data items in National physique and Health Database based on the representation model,which provides a feasible solution to overcome the difficulties in data integration caused by data heterogeneity.In view of the serious heterogeneity of data elements in the current CDE database,a prediction model of CDE compatibility is constructed in this study,which provides technical support for users to use the existing CDEs.Our study will provide technical supports and tools for improving both metadata quality and data quality.
Keywords/Search Tags:metadata management, common data element, ontology, semantic web, machine learning
PDF Full Text Request
Related items