Font Size: a A A

Research On Case Text Grammatical Error Detection And Cause Of Case Prediction Under Unbalanced Data Conditions

Posted on:2023-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:L L MaFull Text:PDF
GTID:2568306902485554Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Since the 18th National Congress of the Communist Party of China,the Central Committee of the Communist Party of China has put forward the national strategy of A Comprehensive Framework for Promoting the Rule of Law to ensure the people’s rights and interests and the orderly progress of social development.How to use existing resources to improve work efficiency and accelerate the process of national strategy has become an urgent problem to be solved.In this context,the Supreme Court put forward the initiative of Intelligent Justice,aiming to use information technology to realize intelligent and efficient management of judicial work,so as to replace the traditional manual processing methods.At present,the research on Intelligent Justice has been carried out in many directions.This thesis selects two tasks to study:Case Text Grammatical Error Detection and Cause of Case Prediction.Among them,Case Text Grammatical Error Detection refers to the grammatical error detection of text data generated in the judicial work.Through the text grammatical error detection technology,the staff can complete the text data grammar check quickly and with high quality in a limited time.The Cause of Case Prediction is to give a qualitative and general description to the case.Through the Cause of Case Prediction technology,the staff can quickly and accurately determine the cause of the case at a relatively low cost,providing help for the court to hear the case and the people to understand the nature of the case.Therefore,it is of great significance to study an effective Case Text Grammatical Error Detection technology and Cause of Case Prediction method to improve the efficiency of judicial work and ensure the accuracy of judicial work,so as to better serve the people and the country.However,in the research process,this thesis finds that the above two research tasks are faced with serious unbalanced data problems.That is,some categories have a large number of sample data,and some categories have only a small number of sample data.This problem will lead to the performance of the model biased towards the category with a large number of samples.In most practical applications,the category with a small number of samples is often more important.For example,among the people who participate in the physical examination,only a few patients may have cancer,but the cost of misdiagnosing health as cancer and misdiagnosing cancer as health is significantly different.Therefore,how to reduce the impact of unbalanced data on performance is an urgent problem to be solved.Through data analysis,this thesis finds that the categories with a small number of samples in the above two tasks can be summarized into a large category,which can balance the unbalanced distribution of data to a certain extent.Based on the analysis results,in order to solve the problem of unbalanced data in the above tasks,this thesis constructs a hierarchical network model based on the idea of hierarchical learning.Specifically,this thesis first constructs the data hierarchy according to the hierarchical relationship of categories,then designs the corresponding network and supervision information for each layer,and uses the back propagation mechanism to train the model.Among them,the network of different layers transfers the information of the upper structure to the lower structure according to the corresponding hierarchical relationship,so as to obtain the new feature of fusing the information of different layers.This method can reduce the performance impact caused by unbalanced data problems to a certain extent.In addition,this thesis also introduces the method of Focal Loss balancing loss function when carrying out the case cause prediction,and takes the prior knowledge as the weight of Focal Loss function to further improve the performance.In this thesis,F1 is used as the performance evaluation index,and a series of comparative experiments and ablation experiments are carried out on real data sets.The experimental results show that compared with the baseline model,the hierarchical learning method proposed in this thesis has significantly improved the performance of both tasks.
Keywords/Search Tags:Hierarchical Learning, Unbalanced Data, Case Text Grammatical Error Detection, Cause of Case Prediction, Priori Knowledge
PDF Full Text Request
Related items