Research On Analysis Method Of Unstructured Documents In Power Grid Based On Deep Learning

Posted on:2022-05-03

Degree:Master

Type:Thesis

Country:China

Candidate:S Huo

Full Text:PDF

GTID:2492306566478454

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

With the advent of "Internet +" and the era of big data,the amount of data owned by grid companies in the process of informatization construction is increasing,especially the proportion of unstructured data in the total amount of data.Unstructured data refers to a type of data that cannot be structurally represented by a two-dimensional table,which mainly includes text,audio,video,images,web pages,and so on.As an important data asset of enterprises,unstructured data will play an increasingly important role in enhancing the core competitiveness of enterprises.However,the following problems generally exist in the process of mining and utilization of unstructured data: Value unstructured data The business requirements for mining are not yet clear,and the application of its value needs to be further improved;the lack of a unified unstructured data business metadata model specification makes it impossible to effectively complete the cross-professional and cross-departmental integration and sharing of unstructured data;Repeated development of unstructured data processing functions,lack of unified planning for unstructured data management platforms,and so on.Aiming at the characteristics of large data volume and low value density of unstructured document data,based on cutting-edge natural language processing,machine learning and deep learning technologies,this paper proposes to pre-train language models in the field of natural language processing(pre-training word vectors,pre-training)Encoder)is applied to unstructured document data management,which integrates deep learning technology and traditional power grid unstructured document data management.First,the company document management(such as issuing,receiving,notification,meeting management,etc.)and power business(transmission,distribution,distribution,change)and announcements,notices,requests,work orders,and inspection reports in the OA system are used as power Professional corpus source,constructing a corpus of power business characteristic data.After that,the process of word segmentation,part-of-speech tagging,and removal of stop words is adopted on the corpus to obtain a corpus suitable for subsequent processing.Then use different layers of transformer feature extractor to capture dynamic word vectors with different grammatical and semantic information instead of traditional Word2 vector or Glove to train static word vectors,and represent unstructured document data as vectors in high-dimensional semantic space.Finally,for specific tasks and data sets,a multi-channel convolutional neural network is introduced to filter the key information,and the model is fine-tuned through fine tuning to achieve the purpose of text classification.The text classification model based on Transformer and multi-channel convolutional neural network proposed in this paper effectively improves the proof ability of word vectors,preserves text semantic information more completely,avoids complicated feature engineering,and has strong generalization.ability.The above innovative research can provide a reference for the subsequent processing of unstructured document data in the power grid,and at the same time precipitate a series of data mining and data analysis techniques in the application field of unstructured document data in the power grid,and for the subsequent unstructured business systems Data application lays a solid foundation and deposits valuable technical assets.

Keywords/Search Tags:

pre-trained word vectors, pre-trained encoders, feature extractors, multichannel convolutional networks, Unstructured documents, text classification

PDF Full Text Request

Related items

1	Remote Sensing Image Scene Classification Based On Pre-trained CNN
2	Research On Text Classification Method Of Quality Risk For Customs Home Appliances Based On Deep Learning
3	A software and hardware system for the autonomous control and navigation of a trained canine
4	Research On Text Multi-tag Classification Model Of Rail Transit Equipment Failure
5	Research On Feature Learning And Classification Of Hyperspectral Remote Sensing Images Based On Deep Learning
6	QSâ…¡ Nuclear Power Plant Automatic Reload Design Documents Form System
7	Research On Retrieval Technology Of Unstructured Text Data In Two-Ticket Training System
8	Research On Text Detection Of Traffic Signs In Complex Natural Scenes
9	A Classification Model Of Power Equipment Defect Record Texts Based On Multi-head Attention RCNN Network
10	Research On Hyperspectral Image Classification Algorithm Based On 3D Convolutional Neural Network