| The judicial field contains a large number of legal provisions,judicial documents and other text information.Applying natural language processing technology to the judicial field and enabling intelligent justice is an important direction for the development of the judicial field in the era of artificial intelligence.The criminal amount extraction task for judicial documents aims to automatically calculate the amount involved in judicial documents through natural language processing technology,which can not only help the information extraction of judicial documents,but also assist the prediction of judicial decisions.This paper studies the intelligent extraction technology of criminal amount from judicial documents.Through the analysis of a large number of judicial adjudication document datasets involving the criminal amount,this paper summarizes the classification rules of the amount operator and the calculation rules of the final amount,and proposes a rule-based method for criminal amount extraction.Based on the method of rule annotation and manual proofreading,this paper constructs the amount classification dataset.On this basis,the paper uses the machine learning method to predict the operators,and studies the criminal amount extraction method based on the pre training language model.In order to make full use of the domain knowledge contained in the large-scale non operator annotation data,the paper creatively introduces task augmentation technology,and studies the criminal amount extraction method combined with task augmentation.This method obtains a large amount of auxiliary task data in the field by designing an auxiliary task text generator,and fine tunes the BERT model while carrying out auxiliary tasks,so as to finally obtain the BERT model in the field for the task.The paper further optimizes the loss function,adds the predicted loss of the final amount to the loss function,and carries out backpropagation,so that the model can be combined with the characteristics of the final criminal amount in the process of learning the characteristics of the amount operator label,thus alleviating the problem of inconsistent changes in the accuracy of the amount operator label and the final criminal amount in the process of model prediction.The paper uses the data set of "criminal amount element extraction" released in the "Chinese Legal Research Cup Judicial Artificial Intelligence Challenge(LAIC2021)" for comparative experiments,and the experimental results prove the effectiveness of the research work.This paper designs and implements a criminal amount extraction system of judicial judgment documents,demonstrating the process and results of criminal amount extraction.Specifically,the research work of this paper includes the following aspects.(1)A rule based method for criminal amount extraction is proposed.This paper analyzes the contents and tasks of a large number of judicial adjudication documents,and gives a formal definition of the task and a rule-based solution.The scheme decomposes the task of intelligent extraction of criminal amount into the task of extraction of criminal amount and classification of operators,that is,first identify all amounts from the text,assign an operator to each amount,and then complete the calculation of the final amount through user-defined reasoning rules.According to the characteristics of the Chinese amount in the judgment documents,this paper uses heuristic rules to identify the amount,and achieves an accuracy of 99.6%.This method uses a rule-based operator keyword matching method to achieve simple and effective operator classification.Based on the method of rule annotation and manual proofreading,this paper constructs the amount classification dataset.(2)Aiming at the weak generalization ability of rule-based amount operator classification method,a criminal amount extraction method based on pre training language model is proposed.This method uses the pre training language model to generate dynamic semantic representation vector for the text in the judgment document,and uses softmax classifier to judge each operator of known amount,and applies the current mature text classification technology to the amount operator classification.(3)In order to make full use of unlabeled datasets,this paper proposes a criminal amount extraction method combined with task augmentation.The task augmentation technology,which uses natural language reasoning as an auxiliary task,integrates large-scale judicial domain knowledge into the general corpus for training,and obtains a pre training language model dedicated to the judicial domain,so that it can perceive the lexical semantics in the domain and have the ability of natural language reasoning.This method can make full use of large-scale domain data,alleviate the problems caused by low resource tasks,and improve the accuracy of criminal amount extraction.(4)In order to alleviate the problem of inconsistency between the change trend of the amount operator classification accuracy and the criminal amount prediction accuracy during the training process of the above model,the paper investigates the amount extraction method that combines the criminal amount and the operator label.On the basis of the operator classification loss function used in the above model,the criminal amount prediction loss function is weighted to take into account,so that the model can not only learn the features of each amount during the training process,but also take into account the loss of the final amount,further improving the accuracy of criminal amount extraction.(5)Based on the relevant techniques proposed in this paper,a system for the criminal amount extraction from judicial judgment documents is designed and implemented.The system demonstrates the process of the task of criminal amount extraction from judicial judgment documents.Through examples,the results of the operator annotation of the extracted amount are shown,demonstrating the interpretability of the model and testing the practicality of the method proposed in the paper in practice. |