| The knowledgebase system can describe concepts,entities,and their relationships in the objective world in a structured form,which is being widely used in various industries and domains,and has become a powerful support for applications such as semantic search,big data analysis,intelligent recommendation and question answering.With the implementation of intelligent justice strategy in China,there is an urgent need to build a variety of legal knowledgebases to provide support for different intelligently judicial applications like legal judgement assistance system and similar case recommendation system.The knowledgebase of properties involved in criminal cases is capable of automatically extracting knowledge related to the disposal of properties involved in criminal cases from existing laws and regulations,so as to give support to the management system of properties involved in criminal cases belonging to judicial units such as public security bureau,procuratorate,court and justice bureau in the process of judicial enforcement.This thesis focuses on named entity recognition techniques for the legal knowledgebase of properties involved in criminal cases.As with all knowledgebase systems,named entity recognition is the core technology for knowledgebase construction.However,due to the particularity of the knowledgebase of properties involved in criminal cases,the following challenges are faced in implementing named entity recognition task during the creation of the knowledgebase:1.As the knowledgebase of properties involved in criminal cases extracts knowledge from legal articles directly based on the requirements of the project,the training corpus is derived from relevant current valid laws and regulations,which causes the scale of the training corpus for named entity recognition is less than that for named entity recognition in the construction of the general knowledgebase.2.Because of the infallibility of justice,the goal of constructing the knowledgebase of properties involved in criminal cases is to provide knowledge support for relevant applications in judicial practice,which requires ensuring the correctness of the knowledge in the knowledgebase.To ensure the correctness of the knowledge,this places extremely high requirements on the accuracy of named entity recognition,the first step of the construction of the knowledgebase.3.Due to the particularity and strong professionalism of legal documents,the process of data annotation requires the assistance of professionals with legal background,resulting in the higher costs of annotation compared with traditional knowledgebase.In response to the above-mentioned challenges,this paper explores the named entity recognition problem for the legal knowledgebase of properties involved in criminal cases,and the following work is accomplished.(1)Through the analysis of the existing legal documents,we collect 11 currently effective legal documents related to the disposal of properties involved in criminal cases.After text preprocessing,the BMEOS tagging scheme is utilized to annotate the corpus and finally we construct an annotated corpus for named entity recognition.(2)Faced with the challenges of limited training corpus and high recognition performance requirement,Ensemble learning is introduced to the task of named entity recognition during the creation of the knowledgebase of properties involved in criminal cases.On the basis of the annotated corpus,four existing recognition methods(HMM,CRF,MEM and Bi LSTM)are exploited separately in this paper to tackle the named entity recognition problem,and then ensembled in parallel using different combination strategies.Experiments demonstrate that ensemble learning is effective in improving the recognition performance with limited training corpus.(3)To address the challenge of costly data annotation,this paper leverages the tritraining method in semi-supervised learning to accomplish the task of named entity recognition.Since the diversity of learners in traditional tri-training depends on sampling with replacement and may degenerate to self-training of a single learner under the situation of smaller training corpus,this paper refines the tri-training algorithm and proposes a semi-supervised based named entity recognition algorithm to deal with the task of named entity recognition for the knowledgebase of properties involved in criminal cases.(4)In order to cope with the challenge of the high cost of data labeling,this paper utilizes the model-based transfer learning method to tackle the named entity recognition task under the scenario of constructing the knowledgebase of properties involved in criminal cases so as to reduce the workload of corpus annotation.The experimental results suggest that this approach can reduce the overhead of corpus data annotation. |