| With the rapid development of informatization construction,the era of big data intelligence,in which data is particularly important,has brought new opportunities and challenges to all walks of life.In recent years,as a new focus area,medical big data has attracted wide attention from all walks of life.Nowadays,countless patients go to the hospital every day,and hundreds of medical data are generated in the process of consultation.Medical records of patients are manually entered by doctors through information systems,and most of them are unstructured data.Describing the diagnosis and examination results in such a way that doctors are familiar with,can make doctors input information more quickly,accurately and conveniently.So the current medical data documents,especially the symptom description part,are mostly unstructured data described in the doctor’s oral language.Therefore,these data are heterogeneous,distributed,fragmented,non-standardized,and sometimes there are data missing phenomenon,which is not conducive to interpretation and processing.Because medical data is stored in unstructured form,it can not be directly processed and analyzed by computer.It is not only inefficient,but also the quality of analysis can not be guaranteed.At present,the methods used in information extraction research are not scalable and have some limitations,so the degree of automation is not high.In order to effectively analyze and mine medical record data through existing analysis methods,and make better use of medical record data,how to effectively structuralize medical data has become a problem worth studying and exploring.In this case,the retrieval system project based on similar medical records emerges as the times require.The project aims to establish a universal,accurate,convenient,easy to operate,efficient and able to process heterogeneous medical data medical records retrieval system.On the medical data platform composed of various medical records,the system searches for similar medical records through their condition,uses a whole medical record as input,and outputs similar medical records for auxiliary diagnosis.The work of this paper is the data processing stage in the medical record retrieval system.The innovation of this paper is to improve the existing methods of ambiguity segmentation and correction in natural language processing,optimize the algorithm in the medical field,and combine LOINC database and knowledge atlas and other related tools and technologies to process medical data in medical records,so as to provide data support for the next step of the project.Firstly,the unstructured medical named entity in data is identified by improved terminology extraction algorithm,then tag extraction and vector construction are carried out through semantic analysis,and the unstructured data is structured by referring to LOINC database,so that the description of data is more accurate and unified;secondly,correlation is used to describe the unstructured medical named entity.Finally,through the combination of knowledge atlas,a visualized patient case portrait is constructed,which shows the knowledge structure and its relationship through content analysis and visualization,and to some extent solves the problem of data missing. |