Font Size: a A A

Research On Sentence Prediction In Criminal Cases For Sentencing Documents

Posted on:2021-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z J ZhangFull Text:PDF
GTID:2516306458966199Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Term of imprisonment prediction aims to predict the outcome of a sentence based on the fact description of a criminal case.In view of the superficial features or the dependence between subtasks,the problem of different sentences for the same crime cannot be solved well.That is to say,the case of the same crime has the problem of different sentence results.Then,if you want to effectively solve the problem of sentence prediction,The processing and construction of the corpus and how to efficiently extract the key elements in the case description have become particularly important,and then based on the key elements,the sentence of criminal cases can be predicted.Therefore,this article mainly does the following research work on this issue.1.Construction of corpus for criminal cases.This article first downloads the dataset from the Chinese AI and Law challenge(CAIL2018)competition,and then analyzes the data comprehensively,and extracts the corpus part required for the experiment in this article through pattern matching,namely the attribute values corresponding to fact and term_of_imprisonment,and then the obtained data is cleaned and processed.Finally,considering the imbalance of prison term categories,we have crawled from China Judgments Online through crawler technology,and then processing to expand the low-frequency category.2.Based on double layer LSTM(Long Short-Term Memory)-Attention end-to-end key element extraction method.This method uses the idea of sequence tagging.Firstly,Jieba tool is used to segment the corpus,and LAC(Lexical Analysis of Chinese)is used to label the part of speech.Then,the skip-gram model of Word2vec is used to train the word vector model on the segmented word data.Then,the word vector of each word in the case description is obtained as the input of LSTM to obtain the semantic information of the case description text.Secondly,the important words in the semantic information representation are obtained through the attention mechanism.Finally,the encoded important words are decoded by LSTM,and the words corresponding to the part of speech of verbs and nouns are obtained as the key element information to be extracted in this paper.3.Research on sentence prediction method based on BERT(Bidirectional Encoder Representations from Transformers)model and fusion of key elements.This method first borrows the idea of sequence tagging,and then obtains the key element information through the long short memory network model(LSTM)with attention mechanism,and then carries on the vector representation to it,then uses the BERT model to obtain the case representation vector.Finally fuses the key element vector representation with the case element vector representation,and the prison term prediction model is obtained through softmax classifier training.
Keywords/Search Tags:Term of imprisonment prediction, Construction of corpus, Part of speech tagging, double layer LSTM-Attention model, Key elements, BERT model
PDF Full Text Request
Related items