| The law enforcement process of the public security organs generates massive unstructured textual data such as case files,police incidents,and cases.Obtaining concise,accurate,and reliable information from large-scale text data is a fundamental condition for improving the efficiency of public security police officers in utilizing text resources,which can support specific tasks such as police incident summaries,case file summaries,and case briefings.As an information compression technology,text summarization can express the content of source texts in shorter sentences.The introduction of deep learning technology has significantly improved the effectiveness of text summarization models,but there are still problems with the lack of concision and accuracy of summary content and inconsistencies between the summary and original text descriptions.This paper optimizes and improves algorithm models based on actual applications in public security,such as judicial document abstracts and case summaries.The main work of this paper includes the following aspects:(1)A keyphrase extraction model based on a Transformer encoder-decoder structure is proposed.The model combines position features to enhance the ability to predict keyphrase boundaries and solve the problem of inaccurate recognition of phrase boundaries in existing models.Additionally,the model calculates the optimal corresponding sequence of predicted and labeled phrases based on the Hungarian algorithm,dynamically adjusts the training direction of the model,and eliminates the influence of fixed keyphrase sorting in sample data.Compared with existing models such as Cat Seq and Ex Hi RD-h on public datasets,including Inspec,Sem Eval2017,and KP20 K,the model achieves the best results on evaluation metrics such as F1@5,F1@10,and F1@M.(2)A text summarization generation model based on keyphrase prompt learning is proposed.The model integrates the keyphrases as prompt information into the input of the text summarization generation model,optimizing the semantic representation of both the original text and the keyword phrases separately.Additionally,the model incorporates a contrastive learning mechanism during the training process to improve the training target to solve the problem of exposure bias.Compared with GSum and Sim CLS on public datasets CNN/DM and XSum,our model achieves improved results on evaluation metrics such as ROUGE-1 and ROUGE-L.(3)To reduce factual errors in generated summaries,the domain pre-trained language model is utilized to provide background factual features.Positive and negative summary samples are constructed during the model training phase to enhance the semantic consistency of the encoding process through contrastive learning.In addition,a summary entity error detection and correction module is added for detecting and correcting summary entity errors to optimize summary generation.On public datasets CAIL2020,CAIL2021,and CAIL2022,this model achieves improved BERTScore and Mover Score.Finally,a prototype system for Chinese judicial domain summarization was designed and implemented.The system implements functions based on public security judicial business requirements such as judicial document summarization,judicial Q&A summarization and case summarization,verifying the effectiveness of the proposed models in this paper. |