Font Size: a A A

Research On Chinese Word Segmentation Sequence Labeling Method Based On Multi-task Learning

Posted on:2023-11-05Degree:MasterType:Thesis
Country:ChinaCandidate:S N LvFull Text:PDF
GTID:2558306845999359Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Chinese word segmentation(CWS)as a foundational task in the field of Natural Language Processing(NLP),is the focus of academic research,and its effect is a key factor affecting the development of subsequent downstream tasks.In recent years,thanks to the wide application of deep learning in various fields,word segmentation algorithm based on deep learning neural network has received a lot of attention and recognition.The method uses a large amount of annotated data training,which is different from the traditional method using rules or statistical patterns,so the generalization ability of the model is greatly improved.However,there are still some difficulties such as unclear word segmentation specification,ambiguity segmentation and out-of vocabulary recognition.The CWS task has a strong adaptability to the domain.The annotated corpus consumes a lot of resources and many domains cannot provide largescale data at all.Therefore,it is very unrealistic to construct a specific corpus for each domain.The traditional NLP task adopts the pipeline model,which regards CWS as an independent single task and then as the input of subsequent work.This mode cannot avoid the shortcoming of error propagation and cannot effectively realize information resource sharing among tasks,which seriously affects the learning effect of subsequent tasks.Therefore,how to overcome the difficulties of traditional CWS methods,solve the problem of pipeline model error propagation,alleviate the scarcity of domain data resources and improve the adaptability of domain are urgent problems to be solved at present.According to these challenges,this paper proposes a Chinese word segmentation sequence labeling method based on multi-task learning,By combining word segmentation and related sequence labeling tasks,this method breaks the traditional pipeline model barrier,provides channels for information sharing between tasks,alleviates the problem of resource scarcity,and maximizes the use of precious resources.This will achieve multiple tasks to help each other,improving the domain and task adaptability and learning effect.The main contributions of this paper are as follows:(1)This paper analyze the deficiencies of mainstream single-task neural network framework and improve BiGRU-CRF system.The framework of Bert-BiGRU-CRF and Bert-BiGRU-CNN-CRF are obtained by adding the pre-trained language model and combining with the convolutional neural network.This framework focus on contextual information to learn semantic features and receptive field to acquire hidden local features.(2)This paper propose a multi-task learning sequence labeling framework.Named entity recognition and part-of-speech tagging are introduced as auxiliary tasks to form a joint model of sequence labeling tasks.Based on the joint loss,parameter sharing and label consistency,the best mode of multi-task sharing information is obtained by progressive exploration and research.(3)This paper propose a Chinese word segmentation labeling method based on multi-task learning.The three modules of pre-trained labeling,multi-task shared learning and joint training are set up.To solve the problem of scarce domain data resources,the special data set of multi-task learning is constructed.Data sharing between tasks increases resource utilization and improves task and domain adaptability.(4)This paper propose a loss calculation mode based on label consistency.In the training,lexical information is introduced to strengthen the entity boundary and the learning of boundary information is emphasized to achieve the purpose of enhancing vocabulary.The problem of specification and ambiguity segmentation can be effectively alleviated,and the problem of out-of vocabulary recognition can be alleviated.And the effect is more significant in the field of small sample and low resources.The experimental results show the effectiveness of the proposed method and its ability to solve word segmentation difficulties,pipeline model error propagation defects,resource scarcity,and improve the adaptability of the domain,better integration development trend of the field to solve multiple problem with single model.
Keywords/Search Tags:Multi-task learning, Chinese word segmentation, Sequence labeling, Neural network, Natural language processing
PDF Full Text Request
Related items