A Lightweight Multilingual Translation Model For Asian Languages | | Posted on:2024-02-26 | Degree:Master | Type:Thesis | | Institution:University | Candidate:Md Mosharof Hossain | Full Text:PDF | | GTID:2568307106983169 | Subject:Computer Science and Technology | | Abstract/Summary: | PDF Full Text Request | | Multilingual natural language machine translation using Asian languages is a complex and challenging task that involves converting text from one language to another across different writing systems,grammatical structures,and cultural contexts.Asian languages,such as Chinese,Tamil,Urdu,Punjabi,and Persian,have unique features that pose significant challenges to traditional machine translation systems.For instance,they often use complex sentence structures,idiomatic expressions,and honorifics that require a deep understanding of the language and culture to translate accurately.Deep learning and neural network technologies are enhancing multilingual natural language machine translation of Asian languages.Transformer-based models like BERT and GPT-3 increase translation quality.Few Asian multilingual translation models have been studied,and current models and feature selection have limitations.Most downstream procedures use pre-trained Chinese characterbased models.These models misread words because they ignore their true meanings.Many Asian languages fall into the "low-resource" category,Due to a shortage of parallel training data and computer resources,building high-quality MT models for many Asian languages is difficult.In the end,we investigated how to improve character-based pre-trained model translation mismatch and Design a lightweight multilingual translation model and we demonstrated our methods by comparing the suggested model simulation result to the benchmark and baseline model with BLUE score with the following research elements:For clarity,we utilized the similarity weight to map a word’s embedding to its letter embedding.Combining character images helps us understand word separation.Alignmentbased attention hides less important characters.We propose an ensemble method to aggregate tokenizer segmentation results to reduce word segmentation error propagation.On four Chinese NLP tasks—sentiment classification,phrase pair matching,natural language inference,and machine reading comprehension—our technique beats the baseline pre-trained models BERT,BERT-wwm,and ERNIE.By doing this we can solve misread words with their true meaning.We designed a lightweight multilingual translation model for Asian languages using data collection,preprocessing,model architecture,training,assessment,and optimization.We presented a lightweight multilingual translation model for Asian languages that handle their unique linguistic features without compromising translation quality.The suggested model uses a transformer design,which has become popular in NLP.We suggested reducing model parameters to improve translation accuracy without compromising precision.Chinese,Japanese,Korean,Thai,Vietnamese,and Indonesian data train the suggested model.Our method also beats the gold standard in multilingual translation models in translation quality,notably for Asian languages.Our experimental results and model-based simulations are presented.The experiment employed the suggested multilingual translation framework EHB-Net.This is the first hybrid deep learning model based on the upgraded BERT model for Asian languages.We demonstrate our method’s superiority by comparing the suggested model’s simulation results to the benchmark and baseline models.The comparison analysis shows that the proposed framework surpasses the reference models in accuracy and results.The proposed architecture loses fewer data than current reference models.This study grouped languages for neural machine translation.Language embedding outperforms previous knowledge(language family)for language clustering in 7 Asian languages,according to BLEU scores.Asian languages’ word order and grammatical structural changes fit the hypothesis.The encoder-decoder structure accurately reflects several languages while keeping their characteristics.We demonstrated the model’s performance with limited training data.Despite lacking training data,our model competed.We compared the proposed model simulation result to the benchmark and baseline model with BLUE score after analysis.To conclude,the lightweight multilingual translation model solves the Asian language translation problem and is suitable for real-world applications. | | Keywords/Search Tags: | Neural Machine Translation(NMT), Sequence-to-sequence(Seq2Seq) model, Transformer model, Attention mechanism, Preprocessing, Tokenization, Word Segmentation, Multilingual NMT(MNMT), Hybrid NMT, Encoder-Decoder architecture | PDF Full Text Request | Related items |
| |
|