Thyroid nodule is a common clinical problem,and clinical examination can determine whether the patient has thyroid cancer.Ultrasound is the preferred examination method,but to make a more accurate diagnosis,pathological examination is also needed,and the resulting pathological report is recognized as the gold standard.Since the pathological report directly reflects the patient’s condition,with the diagnosis conclusion of the pathological report,the ultrasonic image detection result of the patient can be deduced as benign or malignant.This is also of great significance for the annotation of thyroid ultrasound images: the thyroid nodule computer-aided diagnosis system based on ultrasound images takes deep neural network as the main benign and malignant classification model of thyroid nodule,and many annotated ultrasound images are needed as training data for the training of this model.Manual annotation has many problems,such as high threshold,time-consuming and laborious,and low efficiency.Therefore,it is necessary to automatically annotate thyroid ultrasound images.Since the pathological report directly reflects the patient’s condition,the computer is used to automatically analyze the patient’s pathological report and obtain the benign and malignant label,so that the thyroid ultrasound image of the patient can be labeled with the label.Therefore,this paper transformed the problem of automatic labeling of thyroid ultrasound images into the problem of automatic labeling of thyroid pathology reports,that is,how to automatically analyze thyroid pathology reports and obtain benign and malignant labels.By analyzing the steps that human doctors used to read thyroid pathology reports to draw diagnostic conclusions,this paper constructed an automatic annotation framework for thyroid pathology reports.The framework included: based on the typeset characteristics and character characteristics of the pathology report text,an OCR system based on CTPN and CNN-CTC was constructed to identify the diagnostic text in the pathology report;By comparing the text content of thyroid pathology report and ultrasound report,the semantic correlation between the two reports was excavated,and a sentence embedding model for thyroid was trained with the latter as the corpus,which was used to encode the diagnosis text of pathology report into sentence vector.The benign and malignant classification model based on DNN is used to classify the text sentence vector of pathological report diagnosis.For each module in this framework,appropriate algorithms are designed according to the characteristics of thyroid pathology report,and the effectiveness of these algorithms is verified through comparative experiments.In the experiment of benign and malignant classification of pathological report diagnosis text sentence vector,good classification results have also been achieved,which indicate that this framework can effectively automatically analyze thyroid pathology report and obtain more accurate benign and malignant labels,thus realizing the automatic labeling of thyroid pathology report. |