Font Size: a A A

Research On Medical Record Text Clustering System Based On DOA

Posted on:2021-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2404330647463665Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of medical informatization,massive,distributed,and heterogeneous medical data are generated,and its storage in various medical IT systems makes the data difficult to manage and use.At the same time,electronic medical records as an important part of medical data,its rapid growth makes it difficult for medical staff to find the target electronic medical record information accurately and quickly.The organization and classification of medical record data can provide a basis for its retrieval.Existing text clustering systems usually target a single data source,cannot solve the problem of multiple sources of medical record data,and cannot effectively manage data.As a result,it is impossible to realize efficient and rapid automatic classification processing and topic extraction for massive electronic medical records.Using data-oriented architecture(DOA,Data Oriented Architecture)combined with cluster analysis methods can effectively solve the above problems.DOA is data-oriented and data-centric,and manages data through the Data Register Center(DRC),which can effectively integrate multi-source and multi-type medical record data to provide strong support for the management,processing,and analysis of subsequent medical record data.Cluster analysis can enable medical institutions to classify medical records and extract effective information without using manual annotation in medical records data processing.Therefore,based on DOA,this paper establishes a set of DOA-based medical text clustering analysis system.Through the processing and experiment of 5000 actual medical records data,the results show that the function and performance of this system have reached the expected design effect.The main research contents of this article are as follows:(1)Starting from the DOA idea,analyze the characteristics and mining needs of medical text data,design metadata registration specifications for medical text data,study metadata registration methods and the implementation plan of DRC data registration center.(2)Research on the clustering scheme of medical record data to achieve the classification of medical records,including the use of Canopy algorithm for cluster initial point selection,and the use of K-means clustering algorithm to cluster text vectors and Simhash values in DRC Class analysis.(3)Research on the preprocessing methods of medical record text,including text word segmentation,stop word filtering,feature extraction,TF-IDF(Term Frequency-Inverse Document Frequency)calculation and establishment of text vectors to solve The problem of converting text data into text vectors provides input for text clustering processing.(4)The specific implementation of the parallel design of the K-means algorithm using the Map Reduce programming model is studied,and the calculation process from the local centroid to the global centroid is optimized,and an optimization strategy based on the improved Combiner is proposed.Delayed merge to the global centroid acquisition in the Reduce stage effectively improves the clustering quality.The research results and innovations of this article are as follows:(1)A DRC data registration specification for medical record text mining is proposed.The characteristics of medical record data and mining requirements are studied,and a data registration specification suitable for medical record data management and mining is designed,and corresponding registration methods and implementation schemes are proposed.(2)A scheme for clustering medical records based on DRC data registration information Simhash value is proposed,and its clustering efficiency has been further improved compared with text clustering.(3)A parallel design scheme of K-means clustering to improve the process of Combiner is proposed,which effectively improves the quality of clustering.
Keywords/Search Tags:Text mining, Cluster analysis, Medical record classification, DOA
PDF Full Text Request
Related items