Font Size: a A A

Research On Fisheries Standard Term Recognition Based On BiLSTM+CRF

Posted on:2021-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:M ChengFull Text:PDF
GTID:2393330611491181Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Fishery standards are important guiding documents for fishery production.The identification of named entities of fishery standards is the basic work of constructing a standard content of service system in fishery industry.Machine translation,information extraction,question answering systems and other related work rely on the tasks of named entity identification of fishery standards.With the development of computing performance and the breakthroughs that deep learning methods have made in the field of image text.Using deep learning methods is now the main method for studying natural language processing tasks.With the increasing amount of fishery information and the particularity of the fishery field,there is no data set and model of the field to realize the identification of fishery standard named entities.Therefore,combined the particularity of the fishery standard text,this article will study the recognition method of fishery standard named entity based on deep learning.The specific work is as follows:(1)Research on the labeling methods of standard text for fisheries.To address the problem that the identification of named entities in the fisheries standard requires the features of text structure,and the traditional BIO labeling method cannot express the structural information between entities,the E-BIO labeling method is proposed.This method enables the model to learn the contextual structure information of the entity by adding text title tags.Experiments prove that the proposed E-BIO labeling method can effectively improve the recognition accuracy of fishery standard text entities with structural features.(2)Research on BiLSTM+CRF fishery standard named entity recognition model integrating the attention mechanism.In view of the long length of fishery standard text sequences and the problem of sequence semantic dilution,the attention mechanism is introduced in the BiLSTM+CRF model,and the semantic dilution problem is solved by generating constantly changing semantic vectors in the feature extraction stage.Experiments show that after introducing the attention mechanism,the accuracy rate of different types of fishery standard named entities has reached more than 90%,and the recall rate is more than 85%,which has greatly improved compared with the traditional BiLSTM+CRF model.(3)Research on data augmentation method of fishery standard named entity recognition corpus.Aiming at the problems that the sparse distribution phenomenon of samples of named entities such as aquatic product names in the fishery standard text corpus and the poor recognition of entities caused by the disabilities of the model learning too many entity features,based on the analysis of the characteristics of named entities such as aquatic product names in fishery standards,a method for augmenting corpus data of fishery standard named entity recognition that combines similar word replacement and random deletion based on context feature protection is proposed.In this method,"aquatic product name" is regarded as the target word to replace similar aquatic product name words,and the sentence is randomly deleted under the condition of protecting context features to increase the sample diversity.Experiments show that the two data augmentation algorithms proposed in this study can effectively improve the recognition effect of such entities.
Keywords/Search Tags:Named entity recognition, BiLSTM, attention mechanism, CRF, data augmentation
PDF Full Text Request
Related items