| The government attaches much more importance to technological innovation and issues a large number of science and technology policies every day to promote the development of science and technology.Relevant enterprises and researchers need to always pay attention to the massive science and technology policy information from different government websites,official apps,official Weibo and other sources.It is difficult to obtain effective policy information efficiently and accurately from it.Text summarization technology can summarize a large piece of text information through a small piece of concise text information,and only retain the key information,which can alleviate the situation of information overload.However,science and technology policy texts contain redundant non-critical information.The current text summarization model is hard to find and comprehend the key content such as policy name and issue number,which makes the summaries miss crucial points.This paper investigates knowledge-enhanced text summarization techniques.There are two main difficulties in knowledge-enhanced text summarization: how to obtain useful knowledge,and how to incorporate knowledge into the generation of text summaries.This paper first uses the domain named entity recognition technology to identify the essential entity knowledge in the text,and then uses the knowledge of science and technology policy entities to enhance the text summary model,and guide the model to generate more important information,which can improve the summarization generation quality on science and technology policy texts.Compared with the named entity recognition in the general domain,the main challenge of named entity recognition in the professional domain is that the labeled professional domain data is less.In order to reduce the resource overhead of labeled data,this paper proposes a semi-supervised entity recognition model DAT-BERTCRF,this model is improved on the method of virtual adversarial training VAT,dynamically adjusts the loss weight of supervised and unsupervised data,and can better use supervised data and unsupervised data.The relevant experimental results prove the effectiveness of the model proposed in this paper.In order to effectively integrate the science and technology policy entity knowledge into the text summarization model and enhance the generation effect of text summarization,this paper proposes a text summarization model Knowledge-Tr-PGN that integrates entity knowledge.This model is improved on the basis of the commonly used generative summarization model PGN: on the one hand,an additional copy pointer is used to copy information from entity knowledge for summarization generation,which make the model pay more attention to the key entity knowledge and alleviate the problem of missing key information when generating the summary;On the other hand,this paper adopts the Transformer model instead of the LSTM model as the encoder and decoder,and uses the pre-trained weights to initialize the encoder.The relevant experiments on the science and technology policy data set show that the Knowledge-Tr-PGN model proposed in this paper can better understand and mine the key content in the science and technology policy text.It makes the generated abstract contain more key information and improves the quality of generated abstracts. |