Research On Chinese Spelling Check With Data And Knowledge Enhancement

Posted on:2024-03-05

Degree:Master

Type:Thesis

Country:China

Candidate:Q Lv

Full Text:PDF

GTID:2568306941463874

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Chinese spelling check is one of the important tasks in natural language processing,the main purpose of which is to automatically identify and correct misspelled words in user input.However,due to the complexity and ambiguity of the Chinese language,Chinese spelling check is more challenging than English spelling check.For the Chinese spelling check task,a large amount of corpus and training data is often required to achieve better performance.However,due to the difficulty in collecting and updating data for Chinese spelling check,there is often not enough training data,making it difficult for the current Chinese spelling check model to adapt practical applications.Therefore,this paper builds a strong baseline model for this task and conducts related research in data augmentation and knowledge enhancement,focusing on the characteristics of the task.The specific content is as follows:Firstly,based on the inherent properties of Chinese characters,this paper proposes a feature-enhanced Chinese spelling check model based on sound and shape of characters.On the basis of modeling sentence semantics,this paper uses separate pinyin and character shape encoders to independently model the sound and shape of each Chinese character in the sentence,constructing a strong baseline model.Secondly,to address the lack of training data in the Chinese spelling check task,this paper proposes a new pretraining task for data augmentation,namely error consistency pretraining task.In addition,to compensate for the lack of continuous errors in the current dataset in real-world scenarios,we also supplement the continuous character confusion set based on the single-character confusion set in previous work.Finally,this paper introduces user dictionaries as external knowledge and proposes a knowledge-enhanced Chinese spelling check framework based on the user dictionary called UD.This framework can be applied to any spelling check system based on token classification models and automatically adapts to different correction scenarios in different fields without requiring additional training data.

Keywords/Search Tags:

Chinese Spelling Check, Pretrained Language Model, Data Augmenta-tion, User Dictionary

PDF Full Text Request

Related items

1	Research On Chinese Spelling Check Technology Based On Machine Learning
2	Research And Application Of Chinese Spelling Correction Technology Incorporating Phonology And Glyph Features
3	Chinese Spelling Check Based On Neural Network
4	Research On The Traditional Chinese Spelling Error Detection
5	Research On Error Correction Method Of Chinese Short Text Based On BERT
6	Chinese Spelling Check Based On Pre-training Model
7	Research And Implementation Of The Chinese Spell Check Model Based On Deep Learning
8	Research On Chinese Spelling Error Correction Model Based On Deep Learning
9	Research On Deep Learning Error Correction Method Of Chinese Text
10	Research On Tibetan Word Spelling Check Technology Based On Bidirectional LSTM