Font Size: a A A

Research On Bridge Inspection Text Information Extraction Via Prior Information Enhanced Machine Reading Comprehension Neural Models

Posted on:2022-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:T J MoFull Text:PDF
GTID:2492306566969109Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence technologies such as deep neural networks in recent years,text information extraction research with named entity recognition and entity relationship extraction as the core has also made good progress.Usually,the goal of text information extraction is to extract useful information from part of semi-structured and unstructured text,and successfully applied to the field of English text.However,due to the large differences between Chinese text and English text,the performance of Chinese text information extraction is inferior to that of English text information extraction.Not only that,in many special fields,relevant Chinese text information extraction research is still in its preliminary stage.In the current field of bridge engineering,inspection text data is a very important resource in its management and maintenance business system,which contains information about the basic properties of the bridge,structural parameters,and a large number of inspection diseases.However,the current bridge detection text data is stored in a relational database in the form of document links,and the subsequent management and maintenance business activities mainly rely on manual inspection of relevant documents.Therefore,the extraction of key information in the text of the bridge inspection field and the improvement of management decision-making capabilities in this field are of great significance to the development of bridge management decision support or structural state assessment tasks.Aiming at the status quo that the research on text information extraction in the field of bridge inspection is still in its preliminary stage,this thesis carries out the following research:(1)In view of the fact that a relatively complete text corpus has not yet been formed in the field of bridge detection,this thesis constructs a high-quality text outer layer and nested named entity recognition corpus in the field of bridge detection,which lays a solid data foundation for subsequent research.During the construction of the corpus,the characteristics of the text in the field of bridge detection are fully analyzed.Under the guidance of experts in this field,the labeling system and specifications of the corpus are established,the labeling methods and procedures of the corpus are formulated,and the usability of the corpus is evaluated experimentally,resulting in a larger corpus.(2)For the task of named entity recognition in the text information extraction of bridge detection field,based on the data of Named Entity Recognition Corpus of bridge detection field,a method of named entity recognition with information enhanced machine reading comprehension is proposed,which realizes the common recognition of outer layer and nested entity in the text of bridge detection field.In this method,the machine reading comprehension problem with prior information is used as the input of the model together with the text data,and the context character level features are extracted by the BERT model.Meanwhile,the dictionary embedding trained by large-scale data is integrated,and the character level sensitive features encoded by the Bi LSTM are decoded by character probability and entity span prediction.Experimental results show that the proposed method achieves the best F1 value in the recognition of outer and nested entities in the field of bridge detection,and the F1 values of outer and nested entities are 98.50%and 95.33% respectively.(3)For the entity relationship extraction task in the text information extraction of the bridge detection field,based on the constructed outer layer and nested entity data sets of the bridge detection field,the text entity relationship extraction corpus in the bridge detection field is further constructed,and an entity relationship extraction method based on multi-wheel machine reading comprehension question and answer,which can extract text entities and their relationships in the field of bridge detection.The method is based on the BERT model,and the entity relationship extraction is divided into two parts: head entity extraction,and tail entity and relationship extraction in the way of multiple rounds of question and answer,so that the model can gradually obtain the entities needed for the next round of question and answer.The model takes Q&A questions containing a priori information and text data together as input,enhances the feature expression of the model,and decodes the relationship classification based on the span of entities and their relationships,and extracts all entities and their relationships in the input text.The implementation results show that this method is significantly better than other methods.The precision rate,recall rate and F1 value of the entity extraction stage are 94.59%,95.44% and 95.01%,respectively.The precision rate,recall rate and F1 value of the relationship extraction stage are 64.90%,67.43 and 66.14%,respectively.To sum up,this thesis applies the natural language processing technology to the field of bridge detection.By fully analyzing the characteristics of the text in the field of bridge detection,a relatively perfect named entity recognition corpus with high annotation quality in the field of bridge detection is constructed.Based on the data of the corpus,aiming at the two major tasks of information extraction in the field of bridge detection,this thesis proposes a named entity recognition method based on machine reading comprehension and entity relationship extraction methods based on multiple rounds of machine reading comprehension,respectively,and achieves the expected results.
Keywords/Search Tags:bridge inspection field, named entity recognition, entity relation extraction, machine reading comprehension
PDF Full Text Request
Related items