Font Size: a A A

Techniques Optimization Research On Data Statistics Function Of Military Electronic Health Records System

Posted on:2019-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:C Y ChiFull Text:PDF
GTID:2404330542497339Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
As an important subsystem of the military electronic health records(MEHR)system,data statistics function can provide data summarization,distribution characteristics,variation tendency,and data relationships for medical service management,disease prevention and control,data mining of soldiers' health,as well as personal health management.The storage engine that MEHR system currently uses is the MongoDB document database which is flexibly extendable for health data model and is high-efficiency for health records retrieval.Though the data statistics using MongoDB aggregation language can utilize in basic troops,the document storage pattern exists some weaknesses for data statistics,which include decentralized same type data storage,plenty of complicated nested levels,and inadequate aggregation operators.These issues make it hardly satisfy the efficiency demand of whole army's health data statistics,and scarcely support responsive ad-hoc analysis with interactive and intuitive operations.Aimed at the above application background and actual problems,we study how to use big data techniques to realize function enhancement and performance improvement for data statistics of the MEHR system.First of all,we take full consideration of two aspects.On one hand,the MEHR's multistage deployment can generate different amounts of data,on the other hand,the MEHR's statistics functions can be used by system users for whether a regular and fixed way or a temporary and random way.Accordingly,we divide the statistical application scenarios into three kinds,the basic troops fixed statistics,the data center(the health data center of whole army,similarly hereinafter)fixed statistics and the data center ad-hoc statistics.Then we analyze the functions,performance and other requirements of the statistical subsystem.The functions include data preprocessing,statistical analysis,visualization and system management.Secondly,we propose principles for technical choice on the basis of system users,using preferences,when to use and some other features of each statistical scenarios.We go further to investigate mainstream techniques and compare the characteristics of them,and finally reach a conclusion of techniques section.In basic troops,the system users are unit leaders or some specified staffs,who want simple and practical statistics functions,so frequently-used and fixed statistical indicators are enough for them.Therefore,we apply MongoDB aggregation pipeline for basic troops fixed statistics so that we can simplify system design and deployment,reduce maintenance cost,and satisfy application requirements as well.While in terms of data center statistics,leaders in charge of medical service and statistics related business segments demand complex indicators,multidimensional analysis,and interactive operations in some certain tasks,including daily fixed report statistics or occasionally exploratory statistics tasks.As a result,we try big data techniques such as Spark(a distributed memory computing framework)and CarbonData(a columnar storage format)to meet the performance requirements of big data fixed or ad-hoc statistics.Thirdly,we design the overall architecture of MEHR statistics subsystem and elaborate every parts of it.The outcome is a four-layer structure with records storage layer,statistical engine layer,interface layer and application layer in bottom-to-top order.In records storage layer,we study the theory and rules of MongoDB replication set architecture in order to improve the security and robustness of health records storage.As for statistical engine layer,we respectively analyze the application of statistical techniques in the three different scenarios,including MongoDB aggregation pipeline,CarbonData columnar storage format and Spark SQL interactive processing engine.We put all the above techniques in a unified framework with configurable and switchable properties.The interface layer contains work flow and interface examples of statistical requests and responses.We come up with some available statistical charts in the application layer.Finally,we implement and validate the contents in statistical engine layer and application layer.For basic troops fixed statistics,we complete several code snippets for precomputation of four frequently-used statistical indicators.For data center fixed statistics,we establish bidirectional connection between MongoDB and Spark.For data center ad-hoc statistics we accomplish pre-associated data modeling and its storage in CarbonData for outpatient service.Furthermore,common statistical charts are designed and developed.Based on these research and practice,we prepare the validation environment,mock data and classic cases for ad-hoc statistics,test and verify the functions and performance.It turns out that the ad-hoc statistical function designed and implemented by this research can realize quick response in second level on hundreds millions of health records from millions of people.Therefore,it fulfills the statistical analysis requirements of the whole army's health data.The research results of this thesis provide both theoretical basis and technical references for the optimization and improvement of the MEHR data statistics subsystem.
Keywords/Search Tags:Military Electronic Health Records, Big Data Statistical Technology, MongoDB Aggregation Pipeline, CarbonData Columnar Storage Format, Spark SQL Interactive Processing Framework
PDF Full Text Request
Related items