Font Size: a A A

Research On Chinese-Arabic Machine Transliteration Based On Deep Learnin

Posted on:2024-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiFull Text:PDF
GTID:2568306926485054Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Transliteration refers to the process of translating named entities,such as names of people,places,institutions,etc.,from one language to another based on phonetic similarity.Machine transliteration plays an important role in natural language applications such as machine translation and multilingual text processing.With the increasing trade and cultural exchanges between China and the Arab region,as well as the comprehensive promotion of the "One Belt,One Road" initiative,it is very important to study Chinese-Arabic machine transliteration.Based on the current research status at home and abroad,this paper has carried out a series of work,the main contributions and innovations are as follows:1.Aiming at the lack of Chinese-Arabic transliteration corpus,this paper collects and organizes data,and creates a Chinese-Arabic personal name transliteration data set.The data set was created based on Wikidata and Chinese-Arabic news sources.After data processing and manual sorting,a standard Chinese-Arabic personal name transliteration data set was obtained,including a total of 29,762 personal names.2.The research of this paper mainly focuses on the method of generating the transliteration results of Chinese-Arabic personal names.This paper introduces and implements three different transliteration methods based on WFST,LSTM encoder-decoder and Transformer model respectively.In particular,this paper improves the Transformer model used for transliteration,which is the innovation of this paper.For transliteration from Chinese to Arabic,this paper proposes an improved Transformer model based on dual attention units.For transliteration from Arabic to Chinese,this paper introduces an adaptive weighting method based on a gating mechanism to improve the encoder of the model.The experimental results show that the improved two Transformer models have increased the accuracy by 0.43 and 0.53 percentage points respectively.3.This paper also conducts research on the reranking of candidate transliteration results from Chinese to Arabic and Arabic to Chinese to further improve the quality and accuracy of the transliteration results.In this paper,two different transliteration reranking methods are used,namely knowledge graph based and linear combination based methods.This paper also improves on the method based on linear combination.The experimental results show that the reranking method based on the improved linear combination improves the accuracy of Arabic-Chinese transliteration by 1.1 percentage points.
Keywords/Search Tags:Machine Transliteration, Deep Learning, Transformer, Transliteration Generation, Candidate Transliteration Results Reranking
PDF Full Text Request
Related items