Research On New Word Discovery Algorithm Based On Legal Documents

Posted on:2024-08-10

Degree:Master

Type:Thesis

Country:China

Candidate:L Chen

Full Text:PDF

GTID:2556307079970619

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

In recent years,with the advancement of technology,artificial intelligence(AI)has also been rapidly developing.Against this backdrop,the Supreme People’s Court of China has proposed vigorously strengthening the development and application of smart judicial systems.The application of natural language processing(NLP)in the legal field has become a normal phenomenon,such as case text classification and legal automatic question-answering systems.These downstream NLP tasks are all based on a very basic task,namely Chinese word segmentation.There are a large number of professional vocabularies in the legal field that are constantly updated as society develops.This has resulted in the existing segmentation tools being more mature in general fields,but there are still many problems with Chinese word segmentation in professional fields.The main solution is to conduct professional field new word discovery and improve the word segmentation lexicon.However,the relevant datasets for new word discovery tasks rely on manual labeling and it is difficult to achieve large-scale.As for the problem of new word embedding,existing word embedding models require large-scale corpora for training,and require that each word appear frequently enough.However,in the legal field,new words are frequently updated and it cannot be guaranteed that all new words have enough relevant text for training the word embedding model.To address these issues,this paper makes the following contributions:1.Aiming at the problem of lack of new word discovery data in the professional field,this paper proposes a method of anti-transfer learning,taking the part-of-speech annotation of the general corpus as the source domain,the new word discovery task in the legal field as the target domain,using BERT to complete the coding,and extracting the private features and shared features of the task in three parts,in order to enhance the feature fusion effect,it is proposed to use a double-layer Bi LSTM and combine neural adapters to complete the feature fusion.The bilinear attention mechanism is added after the multi-head attention mechanism to ensure that the features that are conducive to the new word discovery task can be extracted from the shared features.2.To address the problem of new word embedding,this paper proposes a new word embedding algorithm that integrates character features,subword semantics,and context information.First,character-level phonetic features are extracted from the target word,and n-gram subword semantics are obtained.Then,a random feature attention mechanism is used to generate a vector of the context of the target word,while ensuring the effect of extracting dependencies between input sequences and reducing model complexity.Multiple context vectors are then aggregated to enrich the semantic representation of word embedding.Finally,a meta-learning method is used to enable the model trained in a general domain to quickly adapt to professional corpus.3.A new word discovery system for the legal field has been designed,including user login and registration,new word discovery,and word embedding generation.The system supports users to upload their own training data for model training,as well as performing new word discovery and word embedding generation tasks using the model.The system also calls backend interfaces to complete the two-dimensional visualization of word embedding.

Keywords/Search Tags:

New Word Discovery, Word embedding, Adversarial Transfer Learning, Legal Field

PDF Full Text Request

Related items

1	Research On Question Answering Technology Towards Unstructured Documents In Military Field
2	Research And Implementation Of Legal Case Recommendation Service System Based On Deep Learning
3	Character Word Font In The Software Copyright Protection
4	Research On Enterprise Entity Recognition And Classification For Court Documents
5	Research On Criminal Charge Prediction Algorithm Based On Transfer Learning
6	Research On The Meaning Characteristics And Interpretation Of Accusation And Punishment Name In The Essential Series Of Dictionary Of Law
7	Research On The Protection Of The Right Font Font And Word Books
8	Study On The Application Of The Features Of Word Processors In Examination Of Printed Documents
9	Study On The Copyright Protection Of Computer Font Word
10	The word became flesh: An exploratory essay on Jesus's particularity and nonhuman animal