Font Size: a A A

Research On Named Entity Recognition Method Of Rail Transit Code

Posted on:2021-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2392330626962960Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Named entity recognition is an important subtask of natural language processing.It is the key to extracting text information.At present,some progress has been made in the named entities recognition in open domains,but there is less work in different vertical domains.As the basis of knowledge in the stage of architectural design.the design code plays a decisive role in changinig the life cycle of architectural engineering design.For rail transit design code,this paper first defines five categories of named entities in this field.Then,two named entity recognition methods for rail transit design code are proposed,one is based on rules and statistics,and the other is based on deep learning.The specific research content is as follows?(1)Entity category definitions for rail transit design codes.As the basis of named entity recognition,this paper first defines the entity categories for the rail transit design code.In the process of the definition of entity categories,this paper is problem-oriented.We comprehensively consider the content of knowledge graph construction,combined with the text description characteristics of rail transit design code,and refer to other vertical domain named entity category definitions to formulate entities category definitions that serve the construction of knowledge graphs.Finally,the entity categories of rail transit design codes are defined as five categories:item information,attribute value,code name,abstract entity and proprietary entity.(2)Named entity recognition method based on rules and statistics.In this paper,domain-specific information is firstly reognized,then based on the existing knowledge base as a dictionary,the optimization bidirectional maximum matching algorithm is used for matching based on the dictionary,and the algorithm is designed to disambiguate,and preliminary processing results are obtained.On this basis,this paper focuses on the description characteristics of the design specification text,formulates boundary correction rule and combination word update rule,and optimizes the preliminary processing results.Finally,based on the frequent pattern tree(FP-Tree),we mine the frequent itemsets of C-Value and other parameters in the positive examples,and based on this frequent itemset,we filter the positive results of the rule processing results to obtain the final result.The disassembly experiments of the method verify the contribution of each module to the model.Finally,we conduct comparative experiments with the three existing classic models and analyze the results.(3)Named entity recognition method based on deep learning.This paper first based on the basic framework of recurrent neural network to build Bi-LSTM-CRF(Bidirectional Long Short-term Memory+Conditional Random Field).On this basis,we add attention mechanism to build Att-BiLSTM-CRF(Attention+Bidirectional-Long Short-term Memory+Conditional Random Field)model.The experiment verified the effectiveness of the neural network method in the Chinese rail transit design code.Finally,this paper compares the experimental results of the traditional method and the neural network model on the same Chinese rail transit design code data set,and obtains the optimal model method.Experimental results show that the method proposed in this paper can solve the problem of named entity recognition of rail transit design code,and has a positive effect on promoting the construction of knowledge graph of rail transit design code.
Keywords/Search Tags:Named entity recognition, Design code, FP-Tree, Deep learning
PDF Full Text Request
Related items