Font Size: a A A

Research On Chinese-braille Conversion Method Based On Pre-training Model

Posted on:2024-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:R WangFull Text:PDF
GTID:2558307079993139Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The automatic conversion of Chinese and Braille is the key to improving the living standards of 17.31 million visually impaired people in China,improving educational technology,and realizing national information barrier-free.However,due to the lack of Chinese-Braille translation software and technical lag in China,this work cannot meet the daily learning and work needs of the visually impaired,which seriously hinders the development of Braille publishing and education for the blind in my country.At present,there are many options for Chinese and Braille conversion technology,which are mainly divided into multi-step Chinese and Braille conversion method and single-step Chinese and Braille conversion method.The steps of the multi-step Chinese and Braille conversion method are"Chinese characters-word segmentation-pinyin-Braille":first prepare the Chinese text that needs to be converted,then establish a Braille word segmentation corpus,and construct and train the word segmentation model in combination with Braille word segmentation rules,segment the Chinese text according to the word segmentation model to obtain the word segmentation result.Establish a Chinese Pinyin corpus,build and train a Pinyin model,and label words and Pinyin information based on the model.Words with pinyin information are finally converted to Braille.Among them,multi-corpus support is required,and the construction cost is high and difficult.Moreover,the multi-model building steps are cumbersome,and the conversion efficiency and accuracy are affected.The single-step Chinese-Braille conversion method is combined with the structure of the neural machine translation model,it can convert Chinese into Braille in a single-step manner.Through the training of existing models,this research aims to explore the Chinese and Braille conversion algorithm,but has not explored its inner mechanism in depth.Therefore,further research is necessary in order to improve the accuracy of conversion.This paper improves the Transformer Chinese and Braille conversion method based on self-attention mechanism,and adds a pre-training model to it,using the BERT model.Train a large amount of plain text data on the BERT model,and then convert the input sequence into the representation of the BERT model.Then inject all layers of the downstream model,adaptively interact with the representation of BERT in the attention module of each layer,and finally output the fusion representation of the BERT model representation and the attention module of the downstream model.Based on the above ideas,this paper proposes three improved models,namely Bes Transformer,Beds Transformer and Best Transformer model.These models convert the input sequence into the representation processed by the BERT pre-training model,and then interact with the encoding layer and decoding layer of each downstream model through the attention module,and finally obtain the fusion representation of them.This paper conducts a comprehensive evaluation and analysis of the Chinese and Braille conversion through experiments.The experimental results show that compared with the Transformer Chinese and Braille conversion method based on the self-attention mechanism,the model performance of Bes Transformer,Beds Transformer and Best Transformer has improved.When using the default weight ω1234=0.25,comparing the BLEU value,it is found that the BLEU values of the Bes Transformer model in the three types of Braille are 91.35%,91%,and 86.85%respectively,which are 12.17%,10.75%,and 1.9%higher than the Transformer model.The BLEU values of the Beds Transformer model in the three types of Braille are94.16%,93.02%,and 89.07%respectively,which are 15.13%,12.77%,and 4.44%higher than the Transformer model.The Best Transformer model has BLEU values of95.15%,93.79%,and 89.67%in the three types of Braille,which are 16.07%,13.54%,and 5.04%higher than the Transformer model.Compared with the Transformer model,the accuracy of the three improved models in the national general Braille test set has increased by 0.86%,0.93%,and 1.02%respectively.Therefore,it can be concluded that the performance of the three models proposed in this paper has been improved compared with the Transformer model,which proves the feasibility and effectiveness of the improved model.When calculating the METEOR value,the Best Transformer model achieved the highest METEOR value under the three types of Braille,which were 99.63%,99.65%,and 99.10%respectively.Through the comprehensive evaluation of BLEU value and METEORE value,it shows that the Best Transformer model is deeper in feature extraction and representation fusion,and better completes the Chinese and Braille conversion task,while the Beds Transformer model is the second,and the Bes Transformer model is the worst.The performance of all three models improves significantly with more training data.After analyzing the conversion results,this paper summarizes five problems,which are word segmentation differences,word segmentation errors,polyphone errors,symbol errors and corpus errors.In this paper,by introducing pre-training into the Chinese Braille conversion task,general semantic feature extraction is performed on plain text data,and the source language and target language are fused with model attention to realize the optimization and innovation of the Chinese Braille conversion method.At present,China uses three kinds of Braille:national general Braille,double-phoneme mandarin Braille and prevailing mandarin Braille.In order to study the Chinese and Braille conversion method,the paper collected the data of the People’s Daily for six months in 1998,converted the data through the Chinese Braille digital platform and proofread by experts,and finally obtained three kinds of Braille sentence-level Chinese and Braille comparison corpus of about 12 million words corpus.
Keywords/Search Tags:Chinese and Braille conversion, Pre-training model, BERT model, Neural Machine Translation, Transformer model
PDF Full Text Request
Related items