Font Size: a A A

Research On Short Text Classification Algorithm Of Obstetric Electronic Medical Record Based On BERT And CNN

Posted on:2021-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2404330647960153Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Obstetrics electronic medical records,as the main channel for doctors to fully understand the situation of pregnant women and fetuses,are of great significance for improving the reproductive health of the population.The structured processing scheme is an important method for information mining of unstructured text in electronic medical records,which improves the efficiency of medical staff.As a key module in the structured function,text classification plays a crucial role in the final structured effect.The rapid development of deep learning technology brings more possibilities for the solution of text classification tasks.It is of great practical value to study how to combine new technologies with existing solutions to further improve the accuracy of existing solutions.This article uses the text data set of the delivery record in the obstetric electronic medical record to propose a short text(sentence level)classification algorithm for the six categories in the delivery record.The algorithm is improved in the following three aspects:(1)The BERT pretrained language model is used to feature the vectorized representation of the sentence,which avoids the problem of traditional Chinese word vectors relying heavily on the word segmentation algorithm,and improves the ability of the feature vector to express the text context;(2)The obstetric medical record text has the problems of irregular text writing and difficult to divide the sentence boundaries.In this paper,the sequence labeling method based on Bi-LSTM-CRF is applied to the sentence segmentation task,which enhances the sentence segmentation ability of the data preprocessing stage;(3)The use of a convolutional neural network containing multiple layers of convolution as a model classifier enhances the model's ability to extract upper-level features.The experimental results show that the BERT + CNN network model proposed in this paper has an F1 value of 94% in the text classification task of obstetric electronic medical records,which is about 6% higher than the benchmark model Text CNN,and the F1 difference can reach 10% on fewer data sets;The F1 value of the sentence segmentation algorithm reaches about 80%,and the use of Bi-LSTM + CRF has a better effect.This article uses the most popular technology in the field of natural language processing in recent years to improve traditional text classification,provides more solution options for structuring,and provides reference and reference for future related research.
Keywords/Search Tags:BERT, Short text classification, Sequence annotation, Obstetric electronic medical records
PDF Full Text Request
Related items