Font Size: a A A

Research On Chinese Name Entity Recognition Method In Judicial Field

Posted on:2024-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z B PengFull Text:PDF
GTID:2556307091997139Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence(AI)technology,the concept of AI has started to take root and flourish in various fields.The judicial field has also begun to integrate AI,and the smart court is one of its products.In recent years,the smart court has attracted widespread attention from the government,scholars,and industry.The named entity recognition task studied in this thesis is one of the fundamental tasks in AI applications,and research on named entity recognition in the judicial field plays a promoting role in the application of AI in case analysis,case description,electronic evidence collection,and legal document assistance related to smart courts.However,there are still certain issues in the research: the judicial field lacks publicly available datasets of a certain scale for named entity recognition;entity annotation in the judicial field requires specialized domain knowledge and cannot directly apply commonly used named entity recognition models to the judicial field;traditional word-level or character-level named entity recognition models have limited effectiveness.Therefore,this thesis conducts research on named entity recognition methods specifically for the judicial field.The specific work is as follows:First,to address the problem of insufficient judicial field corpora,we manually collect data from previous "Legal AI Challenge" and judgment documents from the Chinese Judgment Document Network as corpus data.The data is cleaned and preprocessed.We design entity categories specific to the judicial field and manually annotate the corpus data from the previous step using the designed entity categories,thereby constructing a labeled corpus dataset for the judicial field,which is used for subsequent research in this thesis.Second,For the commonly used word granularity named entity recognition model that integrates vocabulary information,a cross attention mechanism is proposed to address the issues of poor processing ability for long texts,redundant computation,high memory and computational costs.This mechanism is used for the interaction between character feature sequences and vocabulary feature sequences to obtain character feature vectors that integrate vocabulary information and relative position encoding information.In this way,the separation of character sequences and vocabulary sequences can effectively reduce memory and computational costs,significantly enhance the model’s ability to process long texts,and solve or alleviate the above problems.Next,we construct the Cross Former-CRF model.We apply self-attention to the character features output by the Cross Former model and constrain the labels through Conditional Random Fields(CRF)to obtain the optimal named entity label sequence.This model effectively integrates vocabulary information and relative positional information,improving the prediction performance of the model.Finally,we compare the proposed model with benchmark models and other commonly used named entity recognition models on a self-developed dataset through comparative experiments.The results demonstrate that our model outperforms other models in various metrics,thereby validating the effectiveness of our model.
Keywords/Search Tags:Named Entity Recognition, Judicial Field, Cross-Attention, CrossFormer
PDF Full Text Request
Related items