Font Size: a A A

The Research On Key Technology Of Text Retrieval Of Chinese Electronic Medical Record

Posted on:2022-02-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:S C YangFull Text:PDF
GTID:1484306566992169Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
With the continuous development and progress of information technology,the information construction of hospitals has also made rapid progress.At present,Electronic Medical Record(EMR)system has been deployed in hospitals at all levels in China and has been widely used.EMRs records all the medical records produced by patients in the hospital in an electronic way,which are the core contents of patient diagnosis and treatment and play a pivotal role.With the development of hospital treatment activities,the amount of data stored in the EMR system is increasing day by day.The huge amount of stored data,on the one hand,provide comprehensive information for clinical staff.On the other hand,it also poses a significant obstacle to the rapid and accurate access to specific information.Text retrieval(TR)in the area of Information retrieval(IR)is introduced into the electronic medical record system based on the above situation.Through text retrieval technology,clinical staff can quickly get the information they need in the mass patient data information,which can provide enough auxiliary decision-making information for medical diagnosis and improve the efficiency and quality of clinical diagnosis and treatment,and also promoting scientific research in the field of medicine.However,due to the unique language and structure characteristics of Chinese EMRs,the existing algorithms for TR can’t achieve a satisfactory retrieval effect.Specifically,there are mainly the following problems:(1)In order to improve the comprehensiveness of EMR text retrieval results,query expansion(QE)is widely used in EMR retrieval.By adding the expansion words related to the original results,a more comprehensive result can be obtained,which can significantly improve the recall of EMR retrieval and meet the retrieval needs of users.Existing QE algorithms may extract irrelevant terms,and their weights may be unreasonable,which may result in a serious query drift phenomenon.That is,the search extension algorithm brings a large number of irrelevant search results,resulting in a serious decline in retrieval accuracy.(2)Most related researches on QE algorithms add expansion words and corresponding weights directly to the original queries and use traditional retrieval algorithms to calculate the retrieval scores of EMR documents.The process only focuses on the query reformulation and ignores the optimization of the retrieval process.The experiment results show that there is still a certain degree of query drift even when high-quality expansion terms and weights are used.(3)The ranking of retrieval results is an important step of retrieval algorithm,but most existing ranking algorithms of EMR retrieval only calculate the retrieval scores of EMR documents based on word frequencies and haven’t fully considered the structure and language characteristics of Chinese EMR documents with multiple fields and negative meanings,so the algorithm based on word matching may get retrieval results with no clinical significance,which affects the retrieval accuracy and recall.In order to optimize the effect of the Chinese EMR text retrieval algorithm,the research studies the key technologies of Chinese EMR retrieval based on the language and structure characteristics of EMRs and the clinical needs.To be specific,the research improves the algorithm of expansion term selection and weight assignment,QE algorithm,and the ranking algorithm.Meanwhile,a Chinese EMR text retrieval system based on the improved retrieval algorithms is designed.Through the research,the retrieval algorithms have been significantly improved in terms of accuracy and comprehensiveness,thus meeting the actual needs of clinical staff for retrieval and meanwhile enhancing the role of EMRs in auxiliary diagnosis and treatment and clinical research.The main results of the research are as follows:1.In order to improve the quality of expansion terms and the corresponding weights and improve the precision and recall of QE in Chinese EMR retrieval,an improved retrieval algorithm of expansion term selection and weight assignment is designed based on Chinese medical knowledge graph and standard term set.Firstly,the algorithm tests the common language models and selects the most proper model for semantic similarity calculation.Then,the algorithm extracts synonyms,hyponyms and hypernyms from Chinese medical knowledge graph and standard data sets as expansion terms and calculates the weights of expansion terms based on semantic similarities,cooccurrence frequencies and category weights.The experiment results show that the precision at top 10 of the algorithm is increased by 9.38% at least,and the recall at top30 is increased by 55.22% at least compared with the five algorithms,including the algorithm based on co-occurrence frequencies,the algorithm based on scoring functions,the algorithm based on cosine similarities,the algorithm based on Kullback-Leibler,the algorithm based on concepts.Thus,the improved retrieval algorithm of expansion term selection and weight assignment improves the performance of the QE algorithm to a certain extent.2.In order to further reduce the influence caused by query drift in QE based on the improved algorithm above,an improved QE algorithm based on retrieval score adjustment and re-ranking is designed.The algorithm first limits the retrieval scores calculated by expansion terms and then tests the performances of re-ranking with disease terms,symptom terms,treatment process terms,and drug terms.The preexperiment tests the performance of re-ranking with various combinations of re-ranking terms,and selects treatment process terms and drug terms to calculate re-ranking scores based on the corresponding performance.Finally,the final ranking scores are calculated by combining the adjusted ranking scores and the re-ranking scores.The experiment results show that compared with the improved algorithm above,the improved QE algorithm improves the precision at top 10 by 40%,the recall at top 30 by 24.8%,and the Mean Average Precision(MAP)at top 30 by 19.8%,which significantly improves the effect of QE.3.On the basis of the improved algorithms above,in order to further improve the retrieval accuracy,a ranking algorithm of Chinese EMR retrieval based on field weights and negation relation detection is designed.Based on the improved QE algorithms,field weights and negation relations in Chinese EMRs are introduced into the retrieval score calculation.The test results show that the precision at top 10,the recall at top 30 and the MAP at top 30 are increased by 10.2%,32.1% and 45.0%,respectively,compared with the improved QE algorithm,and the effect of the retrieval algorithm is further improved.4.A Chinese EMR retrieval system based on the improved retrieval algorithm has been developed.First,the retrieval system provides the data index module based on Postgre SQL,Count Vectorizer and lil_matrix.Second,the modules of full text search and advanced retrieval are designed based on the improved retrieval algorithm.Third,the retrieval suggestion module and error correction module are designed based on the users’ search behavior.By these consideration,the degree of the system intelligence is significantly increased,thus enhancing the user experience.In this study,some key technologies of Chinese EMR text retrieval were studied.Based on the clinical needs of retrieval and the language and structural characteristics of Chinese EMR documents,the algorithm of expansion term selection and weight assignment,the QE algorithm and the ranking algorithm were optimized and improved,and meanwhile,the retrieval system was constructed based on the improved algorithms.Compared with traditional retrieval algorithms,the designed algorithms in this study significantly reduce the impact of query drift and improves the accuracy and comprehensiveness of retrieval results.The designed retrieval system based on clinical needs provides users with diversified functions and then provides a more convenient tool for the development of clinical diagnosis and treatment and clinical scientific research,which is of great significance.
Keywords/Search Tags:Electronic Medical Record, Text Retrieval, Knowledge Graph, Query Expansion, Retrieval Ranking
PDF Full Text Request
Related items