| Objective: Longitudinal follow-up images are critical to clinical decisions and management of patients with brain tumors.This study aims to develope a fully automated deep learning-based language processing model that can be used to extract clinically relevant brain tumor characteristics from free-text magnetic resonance imaging reports and to find the best performance deep learning model in this task.Methods : Imaging reports from two institution cohort of brain tumor patients were manually annotated in terms of(1)disease presence and(2)cancer stability using a modified PRISSMM framework.The data were randomly split into train,validation,and test sets at a 7:2:1 ratio.Seven deep learning models were built including a one-dimensional CNN,RNN,GRU,LSTM,Clinical BERT,Blue BERT,and ELECTRA.Use validation set data to frequently evaluate the model classification efficiency and constantly fine-tune model parameters.Models were selected using the highest weighted F1 score on an unseen test set.Cohen’s kappa consistency test is performed between the deep NLP model and manual classification results.Multivariate cox proportional hazards regressions were performed to evaluate the relationship between the tumor imaging features extracted from reports with overall survival.Results : A total of 1580 patients that corresponded to 10006free-text radiology reports were included in the study.Kappa between human annotators was 0.77 and 0.80.Kappa between the model and human were range 0.78 to 0.80,which means that there were good consistency between them.The ELECTRA deep learning transformer model had the best performance on an unseen test set for both classification tasks.Weighted F1-score,AUC,sensitivity,and specificity were 0.910,0.96,0.85 and 0.94 for the disease presence condition and0.925,0.96,0.76,and 0.98 for tumor stability,respectively.There was no significant difference in using machine versus manual classification in terms of overall patient survival within each target class.There was an increased association with mortality for patients with reports classified as having cancer present(HR: machine:2.74;manual: 2.84)or with progressing disease(HR: machine:2.25;manual: 2.12).Conclusions:The ELECTRA model is optimized in 7 deep NLP models.The deep NLP model was able to synthesize unstructured image report findings and risk stratify patients. |