Font Size: a A A

Research On Technology Of Important Information Extraction For Power Grid Infrastructure Engineering Documents

Posted on:2024-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:H Q PengFull Text:PDF
GTID:2532306941970269Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:PDF Full Text Request
With the continuous progress of the digitization process of the State Grid,the amount of data of the power grid infrastructure project documents is also increasing day by day.The project documents contain important information that can reflect the project types,project ownership,project participation units and other engineering attributes.As the characteristics of documents,these important information can be applied to the automatic screening and classification of power grid engineering documents.How to accurately extract the above important information from a large number of data has become a problem worth studying in the process of promoting the digital transformation of the national network.The extraction of the important information of the power grid infrastructure engineering files is to extract the features of the text data in the power grid engineering files,extract several pieces of important information,and compare the extracted important information with the characteristics of the known files,so as to confirm the type of the text file,so as to realize the automatic screening and classification of the power grid engineering files and reduce the labor cost.In order to extract the important information from the long text data of power grid infrastructure engineering files accurately and effectively,the important information extraction technology for power grid infrastructure engineering files is studied.The main work is as follows:Firstly,the generated text summary of the text data of power grid infrastructure project files is obtained by using the improved Transformer model.The ELMo algorithm is combined with the sentence vector extraction technology to obtain the dynamic text matrix of the input text with rich semantic information.The convolutional neural network is introduced into the multiple attention mechanism of the Transformer model,so that the attention mechanism has the ability to capture the local information while paying attention to the global information.The dynamic text matrix is used as the input value to calculate the improved Transformer model.A high-quality generative text summary of the text data of power grid infrastructure project documents is obtained.Secondly,the generated text abstract is processed and the TF-IDF value is calculated to obtain the keywords of the abstract.Because the text summary generated by the Transformer model contains the local features and potential word information of the text,using the TF-IDF algorithm to process the text summary can not only obtain keywords but also reduce the amount of computation of the model,so as to extract keywords accurately and efficiently.Finally,the feasibility and accuracy of the important information extraction model for power grid infrastructure engineering files are verified by experiments.The model can automatically extract the file features of power grid infrastructure engineering files,and can meet the needs of automatic screening and classification of engineering documents.
Keywords/Search Tags:power grid engineering documents, text data, important information, Transformer model, TF-IDF algorithm
PDF Full Text Request
Related items