In the information age,data has experienced significant growth.To better store and utilize domain knowledge,it is necessary to construct high-quality domain knowledge graphs.Thesis focuses on the construction of knowledge graphs in the aviation domain,and proposes a semi-automatic information extraction method to build knowledge graphs in the aviation domain,addressing the problems of low automation and high manual costs in the process of building domain knowledge graphs.The main goal of this article is to improve the automation of entity recognition and relation extraction in the information extraction step.To achieve this,the following research contents are carried out:(1)The construction process of the knowledge graph in the aviation field has been standardized,including data acquisition,information extraction,knowledge integration,knowledge processing,and application.This makes the technical roadmap for constructing knowledge graphs in the field more clear,standardized,and systematic.(2)In the entity recognition task,an aviation entity dataset was constructed,and the ElmBERT-BiLSTM-CRFMask entity recognition method was designed.Based on the pre-trained BERT-BiLSTM-CRF model,there are three shortcomings: first,it is not sensitive to local information at the entity level;second,it has high spatial and computational complexity;and third,the CRF decoder may generate illegal label sequences.The ElmBERT model improves in two aspects: first,it adds entity-level masking strategy during pre-training and introduces convolution operations on word vectors to enhance the model’s perception ability for entity information.Second,a hybrid attention mechanism is introduced,using dynamic convolution to replace half of the attention heads,reducing the model’s memory footprint.To address the problem with the CRF decoder,this paper introduces a mask-based CRF variant that imposes constraints on candidate paths during the decoding stage,improving decoding efficiency and accuracy.Comparative experiments on public datasets demonstrated the model’s excellent performance,and verification experiments on aviation datasets demonstrated its effectiveness.Finally,ablation experiments were conducted to analyze the role of each module.(3)In the relation extraction task,an aviation domain relation dataset was constructed,and a syntax-dependent relation extraction model based on attention mechanism,Atti-BRDCNN,was designed.The Atti-BRDCNN model has two main improvements: first,it addresses the limitation of the LSTM global information extraction ability used in the BRCNN model by introducing multi-head self-attention mechanism to enrich the extracted word vector features after the input layer,and solve the problem of long-term dependencies of LSTM.Second,to address the problem that conventional CNN models cannot reflect context and have too many convolution kernel parameters,the conventional convolution is improved to dynamic convolution,which can dynamically generate convolution kernels based on context and reduce the size of convolution kernel parameters.Comparative experiments were conducted on public datasets,and verification experiments were conducted on aviation domain relation datasets,proving the effectiveness of the model.Finally,ablation experiments were conducted to analyze the role of each module.(4)After completing the ontology construction of the aviation domain knowledge graph,the triple is imported into the Neo4 j graph database for storage.Based on this,a knowledge graph platform for the aviation domain is designed and built,which integrates functions such as querying,Q&A,and knowledge visualization.Finally,the aviation domain knowledge graph platform is comprehensively demonstrated. |