Font Size: a A A

Research On Entity Recognition And Entity Relation Extraction Of Metro Design

Posted on:2022-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y N YaoFull Text:PDF
GTID:2492306512976369Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the continuous improvement of public infrastructure,subway has gradually become the primary choice of daily travel tools.Subway engineering construction includes planning,design,construction and trial operation,among which design is the key to ensure the quality of engineering construction,and also the important premise to ensure the safety,economy and use of subway.The subway design code is an important document to restrict the subway design,and it is the result of many years of experience precipitation and repeated demonstration research in China.In order to accelerate the process of information and intelligence in this field,this paper carries out information extraction for metro design code text,mainly including entity recognition and entity relationship extraction.The specific research contents are as follows:(1)The corpus construction of metro design code.At present,the research on entity recognition and entity relation extraction in metro design field is in its infancy,and the existing research has not proposed and published the information extraction corpus.This paper analyzes the code text,and gets the entity types and relationship types in this field,as well as the sublanguage characteristics of the code text.At the same time,the group annotation is used to tag some code texts.The tagging process follows the semi manual closed-loop principle of "generating data sets-training benchmark model-analyzing prediction errors-formulating data update strategy-updating data sets",and constructs the information extraction corpus based on the metro design code.(2)Named entity recognition method based on vocabulary enhancement technology and pre-training mechanism.Firstly,the training is based on the BiLSTM-CRF entity recognition model,which represents the text by the classical sequence annotation method.The bottom layer of the model encodes the characters,but the vocabulary information usually plays a vital role in the entity boundary.To solve this problem,this paper designs a dynamic framework SW-BiLSTM-CRF compatible with word input,including word boundary and word embedding information,so as to enhance the vocabulary of the model.At the same time,with the help of pretraining mechanism,the context data features in large-scale unsupervised corpus are transferred to the model training process.The unsupervised training process includes two stages:open domain pre-training and in-depth pre-training of 800000 building domain code texts to obtain BcBERT,and then the named entity recognition task is fine tuned.Experimental results show that BcBERT-SW-BiLSTM-CRF model can effectively improve F1-measure.(3)Entity relation extraction method based on average-pooling and attention enhancement.Firstly,the code text sequence is represented based on the BcBERT,and the entity information in the text is obtained through average-pooling.At the same time,the relative position information of entities is used to enrich the word-based attention.Finally,the multi-relation prediction results between multi-entity-pairs are obtained through a specific output structure.In the process of the experiment,a number of control experiments were set up to illustrate the efficiency of the method from the perspectives of prediction results and running time.
Keywords/Search Tags:Design code, Named entity recognition, Entity relation extraction, BiLSTM, BERT
PDF Full Text Request
Related items