Font Size: a A A

Research On Analysis Method Of Unstructured Documents In Power Grid Based On Deep Learning

Posted on:2022-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:S HuoFull Text:PDF
GTID:2492306566478454Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the advent of "Internet +" and the era of big data,the amount of data owned by grid companies in the process of informatization construction is increasing,especially the proportion of unstructured data in the total amount of data.Unstructured data refers to a type of data that cannot be structurally represented by a two-dimensional table,which mainly includes text,audio,video,images,web pages,and so on.As an important data asset of enterprises,unstructured data will play an increasingly important role in enhancing the core competitiveness of enterprises.However,the following problems generally exist in the process of mining and utilization of unstructured data: Value unstructured data The business requirements for mining are not yet clear,and the application of its value needs to be further improved;the lack of a unified unstructured data business metadata model specification makes it impossible to effectively complete the cross-professional and cross-departmental integration and sharing of unstructured data;Repeated development of unstructured data processing functions,lack of unified planning for unstructured data management platforms,and so on.Aiming at the characteristics of large data volume and low value density of unstructured document data,based on cutting-edge natural language processing,machine learning and deep learning technologies,this paper proposes to pre-train language models in the field of natural language processing(pre-training word vectors,pre-training)Encoder)is applied to unstructured document data management,which integrates deep learning technology and traditional power grid unstructured document data management.First,the company document management(such as issuing,receiving,notification,meeting management,etc.)and power business(transmission,distribution,distribution,change)and announcements,notices,requests,work orders,and inspection reports in the OA system are used as power Professional corpus source,constructing a corpus of power business characteristic data.After that,the process of word segmentation,part-of-speech tagging,and removal of stop words is adopted on the corpus to obtain a corpus suitable for subsequent processing.Then use different layers of transformer feature extractor to capture dynamic word vectors with different grammatical and semantic information instead of traditional Word2 vector or Glove to train static word vectors,and represent unstructured document data as vectors in high-dimensional semantic space.Finally,for specific tasks and data sets,a multi-channel convolutional neural network is introduced to filter the key information,and the model is fine-tuned through fine tuning to achieve the purpose of text classification.The text classification model based on Transformer and multi-channel convolutional neural network proposed in this paper effectively improves the proof ability of word vectors,preserves text semantic information more completely,avoids complicated feature engineering,and has strong generalization.ability.The above innovative research can provide a reference for the subsequent processing of unstructured document data in the power grid,and at the same time precipitate a series of data mining and data analysis techniques in the application field of unstructured document data in the power grid,and for the subsequent unstructured business systems Data application lays a solid foundation and deposits valuable technical assets.
Keywords/Search Tags:pre-trained word vectors, pre-trained encoders, feature extractors, multichannel convolutional networks, Unstructured documents, text classification
PDF Full Text Request
Related items