With the improvement of living standards,people pay more attention to the reliability of power supply system.How to ensure the stability of power system has become a concern of the state grid.The secondary equipment is mainly to protect and control the primary equipment,so the reliability of secondary equipment is an important guarantee for power system.During the operation of power system,staff will record the defect in the form of text if the secondary equipment is defective,then store it in the production management system.The defect not only reflects the historical health status of secondary equipment,but also contains the reliability information of secondary equipment.If we only make a statistical work on defect data,the defect data will not be fully utilized.In addition,the classification of defect text is often done manually,which is not only inefficient,but also the accuracy is affected by the knowledge level of staff.Therefore,it is necessary to carry out data mining of defect text,which can extract information that has a positive effect on the safe operation of the power system.In this paper,the defect text of secondary equipment is taken as the research object.By using natural language processing method to construct classification model of defect text.And the improved Apriori algorithm is used to analyze the defect text.The specific work as follows:(1)Aiming at the characteristics of defect text,the text preprocessing work is carried out.Firstly,the content and structure of defect text are analyzed,and the characteristics of defect text are summarized.Then the basic process of defect text preprocessing is introduced,and the problems caused by the direct transplantation of traditional text preprocessing method to the electric power field are analyzed.In order to make the word segmentation model suitable for defect text,a professional dictionary is constructed.Based on the performance of commonly used word segmentation models,a word segmentation model based on power dictionary and HMM is proposed.On this basis,data cleaning,word segmentation and stopwords removal operations were performed on the defect text of secondary equipment.It is convenient for subsequent defect text mining.(2)On the basis of text preprocessing,the classification model of defect text is constructed.At present,the text representation model cannot solve the problem of polysemy in defect text.To solve this problem,the BERT text representation model is constructed.According to the characteristics that most of the defect texts are short and the length of defect texts varies greatly,it is necessary to improve the traditional convolutional neural network.In this paper,the multi-scale convolutional neural network is used to extract the feature of the word vector generated by the BERT model,and realize the classification of defect text.Finally,the defect text of secondary equipment from China Southern Power Grid as experimental data,through experimental simulation,the effectiveness of classification model proposed in this paper is verified.(3)The improved Apriori algorithm is used to analyze the defect text.In order to solve the problems existing in the traditional Apriori algorithm,such as the computational efficiency of the algorithm is inefficient;on the one hand,encoding defect information to avoid generating unnecessary candidate item sets;on the other hand,in the process of judging frequent item sets,the calculation time is shortened by reducing the database size.Then,the improved Apriori algorithm is used to analyze the defect text,and find out the relationship between the fault cause and the fault feature,and determine whether the power secondary equipment has family defects.Finally,the simulation results show that the proposed method can save a lot of calculation time. |