| With the continuous advancement of the wave of informatization,the data related to the defects of power grid equipment are gradually stored in the form of electronic text,and the growing data provide a reference for the research on the power grid.The knowledge graph has improved the effectiveness of some complex natural language tasks such as intelligent question answering,intelligent search and inference.However,most fault case reports manually written by maintenance personnel in the operation and inspection scenario of power grid equipment are unstructured data,and the knowledge contained in them cannot be directly extracted and utilized.Moreover,fault case reports are characterized by strong professionalism,complex semantics,and weak normative description.It contains a large number of overlapped and discontinuous entities,such as "circuit breaker living detection","ultrasonic,ultra-high frequency partial discharge detection" and so on.At the same time,an entity has a variety of different forms of description.There are diversity and ambiguity problems,such as "oil chromatographic detection","oil chromatographic analysis","oil chromatographic testing","oil sample chromatographic analysis*and other terms represent the entity of "oil chromatographic detection".These characteristics bring great difficulties to the construction of knowledge map for case reports.To solve the overlapped and discontinuous entity problems,a named entity recognition algorithm based on Mention Relevance Attention(Mention Relevance Attention,MRA)is proposed in this paper.The generative entity labeling method is used to solve the difficulties and shortcomings of existing BIOES entity labeling strategies in identifying overlapped and discontinuous entities.Then,in view of the attention mechanism in Transformer model,combined with prior knowledge of predefined entity labels,an attention mechanism based on correlation between mentions and entity labels is designed and integrated into Encoder-Decoder part.Finally,this paper select eight datasets covering flat,overlapped and discontinuous entities,and constructed comparative experiments on these eight datasets to prove the advancement and effectiveness of the proposed named entity recognition algorithm based on MRA.To solve the problem of entity diversity and ambiguity,this paper proposes an entity linking algorithm for domain knowledge graphs.The entity link is divided into two sub-tasks:candidate entity generation and candidate entity sorting.First,the candidate entity set is constructed by combining the domain knowledge with the Wikipedia page,such as the power grid terminology specification.Then a candidate entity sorting algorithm based on graph convolution is designed.Finally,the advances and effectiveness of the entity linking algorithm based on knowledge graph are verified on three public datasets.Finally,based on the above research,this paper constructs a dataset for grid defect cases and implements a prototype system.This paper analyzes the characteristics of power grid defect case text and the key points and difficulties of named entity recognition and entity link based on it.Then based on the named entity recognition and entity link algorithm proposed in this paper,a named entity recognition and entity link dataset conforming to the application scenario is designed and constructed.Finally,a prototype system for power grid defect cases is designed and implemented,including entity identification and linking and knowledge graph query functions.The system is tested to verify that the system meets the requirements of power maintenance scenarios and meets the design expectations.This paper studies and realizes the task of named entity recognition and entity link in the field of power grid equipment defects,lays a solid foundation for the construction of knowledge map in the power field in the future and the decision support for power transmission and transformation,and helps the implementation and application of natural language processing technology in the power field. |