Font Size: a A A

The Design And Implementation Of Judicial Text Data Automatic Generation System

Posted on:2021-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:S ZhangFull Text:PDF
GTID:2416330647450880Subject:Engineering
Abstract/Summary:PDF Full Text Request
In the context of the construction of smart courts in China,the judiciary has introduced deep learning into the field of justice and legal services,continuously making new achievements in the field of justice and legal services.In the field of natural language processing,judicial deep learning systems use judicial texts such as judgment documents or facts of the case as input.The types of systems that are widely used include automatic sentencing prediction systems,laws and crimes prediction systems,recommendation systems for similar cases and so on.The lack of text data will negatively affect the performance of the in-depth judicial model.In training phase,there is a phenomenon that the lack of training data leads to poor model generalization ability.In testing,the test indicators are singularity,lacking of multi-dimensional test data sets designed with judicial industry characteristics.This thesis designs an automated system for generating judicial text data and implements it,including training and testing data generation module.The training data generation module is used to provide data augmentation services for the judicial deep learning model,increasing the high-quality judicial text training data and improving the prediction accuracy of the model.We design two generation methods based on rule and Variational Auto-encoder.Combining with the characteristics of judicial text,we propose an augmentation method for judicial text in rule-based generation.The generation method based on Variational Auto-encoder applies the Variational Auto-encoder in the field of text generation,learning the mapping relationship between Gaussian distribution and data distribution and reconstructing new text with similar distribution.The testing data generation module is used to provide testing data for the multidimensional evaluation and diversify the test indicators of the judicial deep learning model.The module also designs a method of generating testing data with noise items,which is used to evaluate the anti-noise ability of the deep learning model.The module also designs a method of generating testing data for adversarial-attack.The deep learning model suffers fro adversarial-attack based on genetic algorithm by changing the text as little as possible.Then the testing data after the adversarial-attack is used to evaluate the anti-attack ability of the deep learning model.The system uses a web application based on the Django framework as a carrier.The system support user-defined generation parameters,and the generated text is returned to the user with a file.The system uses HDFS as a file management system to enhance the scalability of file storage.This thesis conducted extensive experiments to prove that the two training data generation methods provided by the system are effective,and can increase the accuracy of the crime prediction model based on Fast Text?Text CNN?LSTM.The two testing data generation methods provided by the system can support multi-dimensional evaluation of judicial deep learning models.
Keywords/Search Tags:Data Generation, Natural Language Processing, Variational Auto-encoder, Adversarial Attacks, Hadoop
PDF Full Text Request
Related items