Font Size: a A A

Research On Chinese Word Segmentation For Medical Domain

Posted on:2023-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:H YuFull Text:PDF
GTID:2544306827475064Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Neural network Chinese word segmentation system can provide the most advanced word segmentation performance in scenarios where manual annotation resource are abundant.However,when it comes to professional texts such as medical domain,the performance of Chinese word segmentation system plummets due to the scarcity of in-domain annotation resource.And the neural network Chinese word segmentation systems tend to perform poorly on low-frequency terms as well.Aiming at the problem of data distribution mismatch and OOV words recognition in cross-domain Chinese word segmentation task,a lexical-augmented Graph Convolutional Network domain adaptive method is proposed.Firstly,an external dictionary is used to match input sentences with candidate words,and a lexical-based text structure diagram is constructed.Secondly,the graph convolutional neural network is used to model the text structure graph,and generate the lexical-based embedding,which is input into the Chinese word segmentation model to enhance the performance of the model,and the TCM domain dictionary is extended to the external dictionary to enhance the segmentation of the model for TCM domain.To evaluate the performance of the proposed method,a Chinese word segmentation test set in the domain of traditional Chinese medicine was constructed.Experimental results show that the performance of the proposed method is better than that of the previous SOTA model on the in-domain dataset and cross-domain dataset.Chinese word segmentation,as a basic task of Chinese lexical analysis,which aims to provide accurate word boundary information for downstream tasks.However,different downstream tasks often require different criterion of word segmentation results.Therefore,a multi-criterion CWS method integrating lexicon information is proposed to achieve domain adaptation of multi-criterion CWS.The lexicon-augmented graph convolutional network module is added to the multi-criterion baseline model to improve the construction process of text structure graph to accommodate the prebias information provided by the multi-standard word segmentation method.Experimental results show that this method can achieve a 0.27%improvement in F1 value compared to the baseline system.At the same time,it brings obvious improvement to the existing advanced multi-criterion CWS method.
Keywords/Search Tags:Neural Network Chinese Word Segmentation, Medical Domain, Domain Adaption, Graph Neural Network, Multi-Criterion Chinese Word Segmentation
PDF Full Text Request
Related items