Font Size: a A A

Research On New Energy Vechicles Named Entity Recognition Based On Multi-feature

Posted on:2020-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:B F ZhangFull Text:PDF
GTID:2492306464995049Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
New energy vehicles named entity recognition(NER)aims to help the market and society to better position the development direction of new energy vehicles,and to better play the role of the domain text in promoting technological innovation.There are many ambiguities in Chinese texts,entity recognition is susceptible to segmentation errors,and new energy vehicles has the problems of fuzzy entity boundary,variable word length,abundant unlisted words,less existing tagged corpus and so on,which bring great difficulties to NER of new energy vehicles.At present,named entity recognition of new energy vehicles is mostly based on statistical and domain knowledge methods,and entity recognition is regarded as a sequence tagging problem.Conditional random fields and other machine learning methods are used to complete sequence tagging.On the one hand,it needs to extract features manually,which is timeconsuming and labor-consuming,and the quality of extracted features is uneven.On the other hand,it has the limitation of relying on local markers to distinguish entity boundaries,which can not get rid of the influence of word segmentation errors and entity boundary ambiguity,resulting in poor recognition results.In order to improve the effect of new energy vehicles NER,this paper proposes an improved idea from three aspects: text feature,entity recognition model and training process.(1)In the aspect of text feature,deep neural network is used to fuse characters,words and external features.The introduction of character features avoids the degeneration of unlisted word feature vectors into zero vectors,and alleviates the difficulties caused by the abundance of unlisted words.Simple artificial features,such as part of speech,position and dictionary,are introduced to solve the problems of complex internal structure and entity nesting.(2)In the aspect of entity recognition model,the Semi-Markov conditional random fields(SCRF)is used to replace the conditional random fields(CRF)in the traditional sequential annotation model,because the named entity boundary of the new energy vehicles is fuzzy and the word length is changeable.SCRF is used to fuse segment features,segment segmentation and entity recognition simultaneously,which breaks through the limitation of traditional sequential annotation model that uses local labels to partition entity boundaries.By introducing the whole and external features of the fragment,the representation of the fragment features is further improved.(3)Active learning is introduced in the process of model training.An active learning method based on the combination of uncertainty and information density is proposed,which selects the most uncertain and non-isolated unlabeled samples to label manually every time,and avoids wasting manpower labeling sample sentences that the current model already handles well.Through the selective learning of unlabeled samples,the recognition effect is guaranteed,and the amount of manual labeling is greatly reduced.In this paper,a series of comparative experiments are set up for the proposed improvement ideas.Experimental results show that compared with the traditional model,deep learning fusion can improve the recognition effect by 4.37%.Using SCRF to fuse segment features,segment segmentation and entity classification simultaneously,the recognition effect can be further improved by 6.42%.Active learning is introduced to label 66% of the samples and only 0.5% of the F1-value is lost,which proves the effectiveness of the proposed named entity recognition model for new energy vehicles.
Keywords/Search Tags:new energy vehicles named entity recognition(NER), multi-feature, Semi-markov conditional random fields(SCRF), entity segment, active learning
PDF Full Text Request
Related items