| The ultrasound reports,which are used to record all the ultrasound examination results of patients such as image descriptions and doctors’ diagnoses,not only contain lots of important clinical information,but also are essential sources of medical researches.Doctors usually write reports in natural language in order to describe the conditions of patients accurately.So this situation leads to the fact that most parts of ultrasound reports are unstructured texts which cannot be directly analyzed by computers,and it also impedes the development of information mining and knowledge discovery.For the sake of eliminating the adverse effects,it is necessary to structuralize the ultrasound reports.The current researches on text structuring processing mainly focus on the information extraction.However,the syntactic features of ultrasound reports are particular,which makes it difficult to extract information relations from ultrasound reports.To deal with this problem,a novel structuring processing method based on traditional information extraction,syntactic parsing and semantic features of ultrasound reports is proposed in this paper.This method uses dependency parsing to obtain the semantic relations of all components of sentences.Then dependency trees can be established according to the results of dependency parsing,and the key-value structured tuples of medical indices can be obtained from these trees.Once the structured results are generated,it is easy to use computers to do some valuable analyses.The main research in the paper is as follows.First of all,the latest development and researches in the field of Chinese natural language structuring processing technology,such as entity relation extraction,dependency parsing and machine learning methods,are summarized in this paper.Also,the current research status of the synonyms recognition and the annotation is introduced in the first part.And the second part of the paper describes the principles of two kinds of software used in this research: Word2 vec and Han LP.Secondly,the framework of the structuring processing based on dependency parsing and the main functions of core modules are introduced.In the preprocessing module,we construct a synonym lexicon by using neural network language models to eliminate the phenomenon of synonymy.Then the dependency trees are generated based on the preprocessed ultrasound reports to extract medical examination indices.Meanwhile,the paper uses short-sentence segmentation and annotation as optimized strategies to simplify the structure of dependency trees,which makes the grammatical relations of medical texts clear and improves the quality of the structured results.The key-value pairs of medical examination indices can be extracted from ultrasound reports,and the structured texts can be generated automatically.In the post-processing module,noise data is modified,and the accuracy and expansibility of the structuring processing can also be improved.Finally,experimental results based on real pathological report data sets show that the proposed method is of availability and generality.The performance on medical indices and values extraction of ultrasound reports achieves 82.91% and 79.11% of accuracy,which provides a solid foundation for related studies in the future. |