Font Size: a A A

A Study On The Alignment Of Han-Vietnamese Blocks Into The Interdependence Relationship

Posted on:2019-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2435330566983705Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
In recent years,machine translation is gradually becoming an important technical measure to alleviate language barriers that people face when communicating with each other.Chunks(or phrases)have played an important role in machine translation.By using chunks instead of words as basic translation units,the order and relevance of local(intra chunk)and global(inter chunj)words can be easily modified.It is of great value to study how to align the Chinese and Vietnamese chunks on the basis of the predecessors and construct a large Chinese-Vietnamese chunk aligned corpus.At present,the recognition of bilingual language modules in Chinese and English,Japanese,and English has achieved satisfactory results,but research on the relationship between Chinese and Vietnamese is still very rare.This article discusses the causes of the impact of the alignment quality of Chinese and Vietnamese chunks and analyzes the problems in the alignment process.At the same time,according to the language features of Vietnam and its research status,the following research work has been completed:(1)A Vietnamese language chunk analysis method based on BiLSTM-CRF is proposed.In order to improve the accuracy of labeling in Vietnamese chunk and reduce the large number of features in the traditional chunk recognition process,a Vietnamese chunk analysis model based on neural networks was constructed.On the basis of segmentation and part-of-speech tagging,there is no manual addition of any Vietnamese language features,and BiLSTM-CRF model is used to implement Vietnamese chunk analysis.(2)Put forward the LSTM-based Vietnamese inter-group dependency analysis method based on the attention mechanism.In order to improve the accuracy of Hanyue's chunk alignment,on the basis of Vietnamese chunk recognition,the LSTM model of attention mechanism was used to train the inter-chunk dependency analysis model,which solved the problem of dependency analysis among Vietnamese chunks.It provides important inter-cluster dependency characteristics for the alignment of Hanyue chunks,and at the same time simplifies the dependency parsing analysis process to a certain extent,and improves the granularity of analysis.(3)Propose the alignment method of Hanyue Chunks that integrates the dependency relationship.Integrating the dependency relationship alleviated the problem of long-distance dependence of bilingual groups.According to the constructed characteristics,the three features that are most useful are selected by using the conditional random field to calculate the gain values of each feature,such as the co-occurrence feature of the bilingual chunk,the co-occurrence feature of the bilingual chunk,and the relationship between the bilingual chunks.In order to reduce the amount of model calculations,based on the selected features,the scores of the bilingual chunk match are calculated,and the Han-Hyun chunk alignment results are obtained.
Keywords/Search Tags:Vietnamese, Chinese, Chunk alignment, Dependencies, LSTM
PDF Full Text Request
Related items