Font Size: a A A

The Text Summarization Algorithm Research On Legal Documents

Posted on:2022-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:G WangFull Text:PDF
GTID:2506306509494294Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,the procuratorates and courts of China have continuously promoted the construction of smart justice and expanded the scope of informatization applications in the judicial field.At the same time,the ever-increasing workload of document processing has also posed challenges to the information systems of relevant judicial departments.Automatic text summarization is an important task and research hotspot of natural language processing.It aims at enabling machines to automatically complete the selection,compression,abstraction of information,and output a text that can be understood by humans.Automatic text summarization can improve the efficiency of information acquisition and reduce the workload of users when combing with specific business processes.Therefore,research on summarization algorithms and applications for legal documents is a topic of practical significance.This paper mainly focuses on the research of summarization model based on deep neural network,and improves the existing abstract algorithm according to the field characteristics of legal documents.Firstly,this thesis proposes a fine-grained pipeline summarization model that combines extraction and generation on the summarization dataset of the "2020 China Legal Research Cup Judicial Artificial Intelligence Challenge(CAIL2020)".The statistical analysis of the information distribution of the dataset shows that most of the summary’s sentences have less information overlap,and have a similar paragraph structure and basically the same narrative sequence with the original text.Based on this feature,the corpus is further processed,and a segmented extractive summarization model with a two-layer Transformer structure is proposed.Comparing the generation effects of different summarization models,the experiment results show that the Pointer-generator Network model works better on long text than short texts,and the Rouge scores of the pipeline summarization model proposed in this thesis is better than other models.Secondly,this thesis constructs a summarization dataset including about 40,000 instances on indictment and proposes an abstractive summarization model combining pre-training model,sentence feature encoding,and pointer generator network on this dataset.The themes of legal documents are generally relatively concentrated,and the wording is standardized.According to this feature,a method for sentence feature encoding to enhance the quality of the abstractive summary is proposed.The experiment compares the performance of other summarization models and different sentence encoding methods,in which Rouge scores,the results of human evaluation and case analysis show that the sentence encoding method based on the topic model can better improve the quality of generation.Finally,the summarization display system designed in this thesis integrates summarization algorithms based on deep neural networks,graph ranking,feature scoring,and topic models.The system has data preprocessing,text processing,and summarization display functions.The interface adopts a wrap-around layout,which is concise and clear,and easy to use.The research results of this article have a certain contribution to the construction of judicial summarization dataset,research ideas and practical applications,and have a certain role in promoting the research of smart justice.
Keywords/Search Tags:Pipeline Summarization Model, Abstractive Summarization, Legal Documents, Automatic Text Summarization, Deep Neural Network
PDF Full Text Request
Related items