With the rapid growth of data volume in medical information system and the deepening of medical data-intensive science and precision medicine research,the potential and valuable diagnosis-treatment information and knowledge in electronic medical records(EMRs)are discovered by data mining technology,which attracts much attention recently.However,in the context of EMR data mining,patient representation,similarity measure,clustering algorithm,and clustering result extraction are the most basic and key tasks in disease diagnosis and treatment pattern mining,their quality directly affects the evaluation and recommendation of results.The purpose of patient representation is to extract effective features from EMRs according to the diversity,temporality,and dynamicity of clinical data and to improve the efficiency of data mining tasks.The similarity measure is to quantify the distance between features,measure the similarity of patients quickly and effectively,and improve the accuracy of clustering results.Based on a similarity measure,clustering divides patients automatically into different clusters,and clustering result extraction defines the core areas of clusters to extract the most representative diagnosis and treatment results,called typical patterns.Compared with traditional cluster representative point or clustering center,typical patterns can reflect the complexity of clinical data better and enhance the interpretability of mining results.Therefore,by analyzing the EMRs itself,on the one hand,for patient admission information,we study typical diagnosis pattern mining method;on the other hand,for patients’ doctor order information,we study typical drug use sequence,typical drug use time,and typical treatment pattern mining method by multi-view information fusion,to promote the standardization of clinical diagnosis-treatment business process.The contributions of this research can be summarized as follows:(1)A typical diagnosis pattern mining method based on patient admission information.In order to solve the problems of insufficient consideration of semantic relation between disease codes and inadequate expression ability of patients’ symptom information in the existing researches on disease diagnosis,a data-driven typical diagnosis pattern mining method is proposed by combining similarity measure,unsupervised clustering,and supervised classification.Firstly,we construct the ontology structure of disease codes and measure the similarity of patient diagnostic information based on code information content calculation,code-level similarity measure,and set-level similarity measure.Next,we apply the affinity propagation(AP)clustering algorithm to extract typical disease co-occurrence patterns,and discuss the relationship between principal diagnosis and secondary diagnosis(i.e.,complications or comorbidities)according to their orders and positions in the ontology structure.Then,regarding the patient demographic information,symptom information and laboratory examination information as attribute sets,and typical disease co-occurrence patterns as label sets,we use two categories of decision tree classification algorithms to mine typical diagnosis patterns from multiple perspectives.Finally,experimental results on real patient EMRs show that the method we proposed can extract high-stability typical disease co-occurrence patterns and high-accuracy typical diagnosis patterns,and can also provide a data-driven research idea for the construction of clinical diagnosis scheme database.(2)A typical drug use sequence mining method based on the doctor order information.In order to solve the problems of complex mining results and poor interpretability in current researches on doctor order sequential mining,we propose a typical drug use sequence mining method considering the repeatability,time inconsistency,and drug combination of doctor order sequence problem.Firstly,we apply process mining idea and Markov chain theory to represent patient treatment records as drug set sequences;also we design a new similarity measure method,which is proved to satisfy the non-negativity,symmetry,and triangular inequality of distance measurement.Then,we use the clustering algorithm to extract typical drug use sequences with a stable number;results are evaluated by treatment effect and treatment efficiency.Finally,the results of experiments on real EMRs show that the similarity measure method we proposed is superior to the existing research methods on the clustering effect.The multi-level typical drug use sequences extracted from drug name and drug efficacy can not only recommend effective sequential treatment schemes for new patients according to their condition admitted to hospital,but also guide the construction and improvement of the existing clinical pathways.(3)A typical drug use time mining method based on the doctor order information.In order to discover potential core drugs and their use time patterns from massive EMRs,we propose a typical drug use time mining method considering the start time,interval,and end time of doctor order duration characteristics.Firstly,this method defines drug use time distribution characteristics and patient treatment records with the idea of a statistic describing the shape characteristics of sample data and designs a similarity measure method.Then,it adopts the clustering algorithm to extract typical drugs and their effective use time and uses the extracted results to evaluate and annotate disease code by patient admission information and treatment outcome.Finally,the results of experiments on real EMRs show that the proposed method can extract the most representative typical drug use time pattern,and typical drug use time pattern with effective treatment results after evaluation can contribute the prediction and recommendation of drug use time during the patient treatment process.(4)A typical treatment pattern mining method based on multi-view information fusion of doctor order.In order to obtain a treatment regimen with highly interpretability,informative,and rational drug use,we propose a typical treatment pattern mining method based on multi-view information fusion.Firstly,according to the six attributes of doctor order,such as drug name,efficacy,route of drug administration,dose,daily frequency,and start-end time,we analyze the content,temporality,and duration of doctor order,and design the representation method and similarity measure method of patient treatment records.Then,we adopt the multi-view similarity network fusion method to integrate three-view similarity and use the spectral clustering algorithm to cluster and extract typical treatment patterns.Finally,the experimental results on real EMRs show that the proposed multi-view similarity measure method is superior to single-view,linear combination,and the existing research methods on clustering effect.From three views of doctor order,we can extract core drugs,route of drug administration,daily dose,number of drug use,and drug use time,which contribute to the realization of the "Five Rights"goals of rational drug use,that is,the right drug,the right dose,the right time of drug administration,the right route of drug administration,and the right patient.For theoretical contributions,considering the variety,temporality,and incompleteness of EMRs,this thesis designs a typical diagnosis and treatment pattern mining methods including EMR data preprocessing,patient representation,similarity measure,clustering algorithm,clustering result extraction and evaluation.For practical values,applying the designed methods to EMRs,this thesis can mine the most representative disease diagnosis and treatment regimens,and assist the formulation of standardized clinical diagnosis-treatment business processes. |