Font Size: a A A

Research On Representation And Classification Of Legal Issue Based On Hierarchical Clustering Of Multiple Semantic Factors

Posted on:2021-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2506306230978099Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of big data technology,artificial intelligence technology is gradually introduced in the legal field,and more and more legal intelligence applications have been produced,which are mainly divided into assisting litigation and assisting trial.This thesis proposes a text representation model that focuses on text similarity,and designs a text classification model with multiple semantic information on the basis of the requirement that the parties need to find similar cases and distinguish case types when they encounter a legal case.The current text representation model is mainly based on element statistics.The mainstream statistical model considers the clustering of similar elements.However,for subjective legal texts with non-standard terms and a wide vocabulary,monotonous clustering does not fully reflect the associated characteristics of elements.In addition,some studies directly use machine learning algorithms to train the text representation model,which mainly focuses on the context information of the elements,but instead loses the related features of similar elements that should be the clustering.The text classification task can be directly completed by using the distance calculation principle of the text representation model,but in order to more accurately classify text,it is necessary to mine more abstract independent and associated features of the text.Generally,researchers design reasonable classification algorithms for in-depth training based on the characteristics of existing text representation models.This thesis analyzes the shortcomings of current text representation models and text classification methods applied to legal issues.Under the support of legal big data,a text representation model MSC-TK and a corresponding text classification model mtCNN are proposed: For each unit in the background corpus,we mine the association features between them based on multi-semantic factors(Semantic factor: a way of associating each word with respect to other words).These semantic factors include semantic similarity features obtained based on word2 vec,semantic relevance features obtained based on Word2 vec parameters + point mutual information,and semantic relevance features based on knowledge graph.Then,we hierarchically cluster all the words in the background corpus based on the above-mentioned related features,so as to reflect the effect of related words on text distance more accurately.Finally,combining mathematical statistics methods and deep learning methods,a multi-dimensional and deep information text representation model andclassification model for legal problems are constructed.Specifically,the text representation model of the legal question is presented in the form of a vector space model,Its vector units include word units and cluster units with clustering weights.The classification model uses a convolutional neural network algorithm,which references and simulates the principle of edge detection in image classification,and uses the constructed text representation model as a weight model for training data to train a text classification model containing multiple layers of semantically association features,which is highly interpretable.The experimental results have shown that the representation model constructed in this thesis can effectively and accurately match similar legal cases in the legal database.The classification model trained in this thesis can be used to accurately classify a given legal problem.
Keywords/Search Tags:law, text representation, text classification, semantic association, deep learning
PDF Full Text Request
Related items