Font Size: a A A

Research On Mongolian Morphological Analysis Using Deep Neural Network

Posted on:2020-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:S A ChenFull Text:PDF
GTID:2415330578954965Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Different from Chinese which belongs to isolated language family,Mongolian belongs to agglutinating language family.Mongolian words consist of root,stem and affixes.Usually,the splitting of Chinese words is called Chinese word segmentation,while Mongolian language needs to do morphological analysis,which contains not only the morpheme recognition of word formation but also the morphemic part-of-speech tagging,that is,Mongolian morphological analysis.Generally,Mongolian script can be divided into Hudum Mongolian and Latin Mongolian,namely,old Mongolian and new Mongolian.These two scripts,due to different writing rules and linguistic characteristics,however,are very difficult to be converted to each other.There are natural spaces between Mongolian words,so it is not necessary to do word segmentation like Chinese.Mongolian language has a very rich variety of aspects,voices and moods.The composition of Mongolian words is achieved by affixing different suffixes after the root and the stem.Therefore,from the angle of the granularity of morpheme,it is necessary to segment the components of the word in Mongolian,that is to say,to segment the morphemes,in order to identify the root,the stem and affixes of each word.There are a vast array of parts of speech in Mongolian,and different levels of part-of-speech tagging are needed according to different granularities in the sentences.However,many morphemes in Mongolian have the same form but different parts of speech,and the ambiguity problem makes the Mongolian part-of-speech tagging task quite complex.The traditional Mongolian morphemic segmentation and part-of-speech tagging methods are mainly based on rules,statistics,or combination of the two.These methods need heavy feature engineering,and the accuracy is quite low.In order to solve these problems,this paper proposes a Mongolian morphological analysis method based on deep neural network.This method does not require manually-created rules or feature templates.The main research contents and innovations of this paper include:(1)Creating a complete Mongolian-Latin conversion table,including character conversion table,punctuation conversion table and special words conversion table.(2)Integrating Mongolian linguistic knowledge into data preprocess and postprocess,including special processing of Mongolian special control characters,word frequency sorting of affixes,anti-division of original corpus,manual correction,word boundary recovery,named entity recovery,and part-of-speech dictionary restoration,etc.(3)Initiating a Mongolian morphological analysis method based on deep neural network.This method designs a new six-character labeling method for data annotation of Mongolian morphemes.The morphemic segmentation and part-of-speech tagging system in this method adopts Bi-LSTM+CRF framework.The experimental results demonstrate the effectiveness of the proposed method.(4)Putting forward a Mongolian part-of-speech tagging method based on neural language model.The experimental results show that,by integrating the pre-training ELMO language model with the character-level Bi-LSTM language model,the proposed method can effectively improve the accuracy of the morpheme-level part-of-speech tagging of Mongolian.
Keywords/Search Tags:Mongolian, Morphological Analysis, Morphological Segmentation, Part of Speech Tagging, Deep Neural Network
PDF Full Text Request
Related items