Font Size: a A A

A Study On Dependency Parsing For Indonesian

Posted on:2020-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:S H FuFull Text:PDF
GTID:2415330590980618Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
The economic cooperation and trade have got stronger and stronger between China and Indonesia,since these two countries established the comprehensive strategic partnership in 2013.By collecting and analyzing the news and information posted in Indonesian news websites and social media,we can learn more about this country.While most of these news and information are written in Indonesian,it would be of great convenience for researchers in related fields to be able to conduct automatic processing and analysis on these texts.However,compared with high-resource languages such as English and Mandarin Chinese,Indonesian,as a low-resource language,gains little attention in the field of natural language processing,which leads to its limited language resources and language processing tools,and therefore restricts its development in natural language processing.On the other hand,taking an important part in natural language processing,syntactic parsing essentially links the upper-layer applications and lower-layer technologies.However,current research on syntactic parsing for Indonesian is rather scarce,which leads to the lack of large-scale Indonesian treebanks,as well as the studies of how to apply those state-of-the-art methods to Indonesian.To address the aforementioned problems,on the basis of the existing theories and methods for dependency parsing,this work explores the idiosyncrasies of Indonesian and proposes the parsing methods more suitable for Indonesian.Specifically,this study includes:1.Deep learning-based dependency parsing for IndonesianWe summarize several commonly used,deep learning-based dependency parsing methods,and apply them to Indonesian text,trying to validate the feasibility of using them on Indonesian.In addition,based on some characteristics of Indonesian,we amend a publicly available Indonesian dataset,which could help train parsers more suitable for Indonesian.In our experiments,we use conventional machine learning methods as the baseline,and also compare three types of neural dependency parsing methods.The results show that the deep learningbased models notably outperform the conventional ones.In terms of the best model,the unlabeled attachment score reaches over 87%,while the labeled attachment score over 82%.Compared with previous works on dependency parsing for Indonesian,the accuracies of deep learning-based models are quite promising,which makes them potential Indonesian parsers.Meanwhile,amending the dataset allows the introduction of fine-grained syntactic information,which could enable us to better understand and analyze the syntactic structures of Indonesian,and hence to build language-specific syntactic parsing models.2.The construction of an Indonesian dependency treebank by means of English-Indonesian parallel sentencesMaking full use of English-Indonesian parallel sentences,we investigate some language characteristics of Indonesian to build an Indonesian treebank.We firstly obtain the word alignment information from a large number of English-Indonesian parallel sentences,and then project the syntactic relations of English sentences onto their corresponding Indonesian ones.After delving into the differences in grammar between these two languages,we propose several revision rules to correct those wrongly projected relations.Compared with manual annotation of syntactic relations,our method can lessen humans’ workload and obtain annotated texts more quickly,and therefore could be an effective way to construct a large-scale treebank.In this preliminary work,we build an Indonesian dependency treebank with 3000 sentences.Our corpus as the training set,the unlabeled attachment score on the manually annotated Indonesian test set reaches over 70%.
Keywords/Search Tags:Indonesian, dependency parsing, neural network, English-Indonesian parallel sentences, dependency treebank
PDF Full Text Request
Related items