Font Size: a A A

Research And Implementation Of Chinese Electronic Medical Record Text Semantic Segmentation Method Based On Deep Learning

Posted on:2023-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:P L AiFull Text:PDF
GTID:2544307100475704Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Electronic medical records have great value in medical knowledge graphs construction and clinical assistant diagnosis.If the medical record text can be divided into several semantic segments by means of semantic segmentation,and then the data can be structured in a targeted manner based on these semantic segments,it will help to improve the accuracy and completeness of structured knowledge.Therefore,the research on semantic segmentation of Chinese electronic medical records has very important practical significance.However,Chinese electronic medical records have strong text specialization,special writing style,large differences in disease course records,many medical terms,and new terms are constantly generated,which brings great challenges to text semantic segmentation.At present,there is no method and system for semantic segmentation of electronic medical records,so it cannot meet the requirements of practical applications.To solve the problems,the research on the Chinese electronic medical record semantic segmentation method based on deep learning is carried out.The main points of our research are as follows:This thesis proposes a Chinese electronic medical record text semantic segmentation model BERT-wwm-Char Segmentation Transformer(BERT-wwm-CSTR for short)based on Transformer Encoder.Firstly,through the analysis of the text content of Chinese electronic medical records,a multi-feature text embedding method based on BERT-wwm model is designed.This method simultaneously obtains the sentence vector and the character vector of each character in the sentence through the BERT-wwm pre-training model,and then weights the two vectors to generate the weighted vector of each character as the final character embedding,the method retains relatively complete semantic information of medical records and is more suitable for text semantic segmentation tasks.Then,a Random Dropout Self-Attention(RDSA)method is proposed based on the study of the distribution of different semantic segments in Chinese electronic medical record text and the structural characteristics of Transformer Encoder’s multi-head self-attention.The method uses the self-attention mechanism to calculate the attention scores of each character and the rest of the characters in the text,and then filters out the attention between different categories of characters through the random drop method,which helps the network filter the attention of the same category of characters.By focusing on characters of the same category in different regions of the text,the network enables diverse intra-class local features to complement each other and solves the problem of mutual interference between different categories of characters.Finally,linear regression is performed on each character through the full-character linear regression method to calculate the weight value of the character and each semantic category,then the character-level probability distribution and optimal prediction results are obtained through the classifier,and the semantic segmentation of medical record text is realized.To verify the effectiveness of the model,the text semantic categories are defined and a Chinese electronic medical record text semantic segmentation dataset is constructed.The experimental results show that the BERT-wwm-CSTR model has strong semantic segmentation ability,and the model still has good performance on small-scale data sets.In addition,the BERT-wwm multi-feature text embedding method and the RDSA method complement each other,which improves the semantic segmentation effect of the model.The accuracy rate,recall rate and F1 value of the model in this thesis reached 86.40%,87.46% and 86.93% respectively,and the MIo U index was 77.38%,which is the best result of the Chinese electronic medical record text semantic segmentation model.Significant semantic segmentation results were achieved,and a new method for semantic segmentation of Chinese electronic medical records was developed.A Chinese electronic medical record text semantic segmentation system based on B/S architecture is designed and implemented.The system interface is simple and friendly,easy to operate,and can be used on multiple platforms.The system supports the use of this algorithm model BERT-wwm-CSTR for text semantic segmentation,and can view and save the text semantic segmentation results.
Keywords/Search Tags:EMR, Semantic Segmentation, Deep Learning, Transformer, Self-Attention
PDF Full Text Request
Related items