Font Size: a A A

Research On Hadoop-based Medical Data Storage

Posted on:2023-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:H B XingFull Text:PDF
GTID:2544306815491814Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the improvement of science and technology and people’s quality of life,people attach great importance to their own health.The rapid development of medical informatization,as well as the proposal of health information services and smart medical services that benefit the whole people,have generated a number of columns of medical data.On the one hand,these data information not only derive from various internal business systems of the medical industry,but also include the data of the user’s geographic information,weather environment and other external medical related information systems,and medical data is growing rapidly and showing the characteristics of big data.On the other hand,health care and other related applications will develop in depth in the direction of intelligent and lean management,and how to rationally use big data technology to promote the breadth and depth of the application of intelligent medical and other related technologies has become a new challenge and new opportunity in the future medical industry,and enhance the development of medical big data applications.However,the existing medical data storage management platform is far from being able to fully meet the growing demand for the storage and management of massive medical data information.In this paper,the problems of multi-source integration,storage optimization,application of related data query,parallel analysis and processing of big data with distributed data store are deeply studied.In this paper,we aim at the common characteristics of different systems from which each data comes from,and build a data exchange and migration method that meets the needs of each system,so as to realize the migration of multi-source medical big data information across system platforms.Aiming at the data heterogeneity problem existing in the multi-source integration process of big data analysis,the data efficient normalization integration is realized by generating standardized metadata and establishing the corresponding data dictionary of metadata.On the basis of cross-platform migration and diversified integration of data,aiming at the core problems of efficient storage and efficient query of medical big data,this paper will use Hadoop architecture to study the method of data storage optimization of medical big data storage in view of the requirements of medical automation and related applications for multi-source medical big data storage,data query and analysis management,and propose the hash bucket storage algorithm of medical big data analysis and its correlation,so as to realize the efficient centralized storage of related data.By experimenting on Hadoop cluster nodes and comparing them with relational databases and unoptimized Hadoop methods。In turn,it significantly improves the efficiency of data query management in the later stage.Data correlation queries optimized by the hash binning algorithm take about 40% of the time spent on traditional unoptimized Hadoop queries and only17% of the query time required for relational database storage.the effectiveness of multi-source medical big data correlation queries and parallel processing optimized for hash bin storage is demonstrated.
Keywords/Search Tags:Medical data, Hadoop, Data integration, Hash bucket algorithm
PDF Full Text Request
Related items