| In the past decade,as the basic task of clinical text processing,clinical entity recognition has attracted widespread attention.However,most research focuses on clinical texts in English rather than other languages.Recently,with the development of Internet technology and the popularization of electronic medical records,more and more Chinese medical record texts have appeared,which is conducive to the realization of smart medical treatment through text information processing and bringing efficient and accurate medical services.This paper will use named entity recognition technology to study information extraction in electronic medical records.Breast cancer,as the most common malignant tumor among women in the world,which causes considerable burden to the patients and the public health system.Early detection and treatment of breast cancer can greatly reduce the mortality rate and economic burden of patients.Therefore,clinical guidelines recommend regular mammography to assess the risk of breast cancer.This research aims to develop natural language processing methods to extract Breast Imaging Reporting and Data System(BI-RADS)findings from Chinese mammography reports to support clinical operations and breast cancer research in China.The main work of this paper includes the following:(1)This paper refers to the Chinese guidelines for diagnosis and treatment of breast cancer,and annotates the BI-RADS findings in Chinese mammography reports.Based on this labeled data set,this paper developed some natural language processing models to extract BIRADS findings from Chinese mammography reports.including Hidden Markov Model(HMM),Conditional Random Field(CRF),and BiLSTM models.And on the basic of the BiLSTM model,this paper proposes the BiLSTM-Attention model with attention mechanism and the BiLSTM-Highway model with highway network.Among them,the BiLSTM-Highway model can fuse pre-trained word embeddings,and has achieved good experimental results.(2)This paper proposes a neural network structure of IDCNN-BiLSTM-Highway.Through CNN to quickly extract local features between Chinese characters,BiLSTM to obtain the ability of long-term dependence between texts and Highway network to fuse pre-trained word embeddings.This structure can obtain an effective text sequence feature representation.In order to better integrate the pre-trained word embeddings,this paper changes the transform gate structure so that the relationship between features can be considered comprehensively when the information is propagated on the highway layer.Related experiments were carried out on two data sets,and the results showed the effectiveness of the model structure and the modified transition gate structure. |