| In this era of high informatization,the construction of medical informatization has developed rapidly,and the various types of clinical data generated therewith are increasing day by day,especially electronic medical records.Unfortunately,the unstructured texts in electronic medical records,which are precious medical data,have not been fully analyzed and utilized,which has undoubtedly caused serious data waste.Esophageal cancer is one of the most common malignant tumors in China.Its morbidity and mortality rank sixth and fourth in all malignant tumors,respectively,which has a serious impact on the health and quality of life of residents in my country.The above conditions present a new challenge to the development of medical informatization: for patients with esophageal cancer,faced with huge electronic medical record data,how to fully process and use it? Solving this problem can not only improve the storage quality and management efficiency of electronic medical records,but also can be used to support relevant research by medical personnel,as well as to model the health status of patients and personalized auxiliary medical care.Therefore,this issue leads to the research topic of this paper for knowledge mining of esophageal cancer electronic medical record text data.This paper tries to realize a set of esophageal cancer electronic medical record data mining system from text information extraction to cancer stage prediction modeling.Under this basic goal,this article mainly carried out the following research works:(1)A text sequence annotation tool is designed.According to the application scenarios and requirements of this article,through predefined types of esophageal cancer medical entities and labeling specifications,an electronic medical record text sequence labeling tool has been developed to provide basic data support for tasks such as medical entity recognition.(2)A pipelined electronic medical record text information extraction process is proposed,including medical entity identification and entity relationship extraction.Firstly,medical entity recognition was realized based on the conditional random field method.Then,based on the medical record text structure and language characteristics,a rule-based entity relationship extraction method was designed,and finally the information extraction of the esophageal cancer electronic medical record was completed efficiently.(3)The graph visualization of the current medical history records of esophageal cancer patients is realized.This research integrates the results of the above entity identification and relationship extraction through computing services into the esophageal cancer precision medical system platform to visualize the atlas,and realizes the rapid conversion of new records from text to graph.(4)An esophageal cancer staging prediction model based on flexible neural tree(FNT)was constructed.A combination of FNT network and floating centroid method is proposed,and part of the result of entity recognition is used as the input feature of the model,which not only retains the automatic feature selection ability but also improves the classification performance. |