Font Size: a A A

Tibetan Language Model Integrating Morphological Structure And Grammatical Relations

Posted on:2021-04-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:T J GengFull Text:PDF
GTID:1485306548474384Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Language is the most important way of information exchange in real life.Language model is a basic work in language research.It can provide effective word representation and probabilistic representation of word sequences.It can be applied to related researches such as speech recognition,machine translation,handwriting recognition,and syntax analysis.As the core component of the natural language processing system,the language model can provide word representation and overall and has achieved relatively ideal results in the relatively redundant language field of training corpus.The study of Tibetan language models is still in its infancy.Considering the lack of Tibetan corpus resources and the scarcity of researchers,the existing work is basically applied to English,Chinese and Japanese research methods.In this context,starting from the model structure of the deep neural network,a series of systematic and in-depth studies are carried out.On the one hand,it is to verify the effectiveness of the model we build,on the other hand,from the Tibetan morphological structure,to solve the problem of obtaining more effective information in a limited corpus to supplement the lack of resources.Tibetan language is a kind of low resource language,there are currently no open standard audio and text data resources.Based on the characteristics of the Tibetan Lhasa dialect and the particularity of the Tibetan text,we considered the phoneme balance and text domain issues,resulting in a Tibetan audio and text corpus.Based on the continuation errors and insertion of some functional words in Tibetan sentences,we focus on the influence of suffixes on functional words in Tibetan and the influence of morphological verbs on additional words.On the basis of the above,first of all,we propose a language model of the static morphological structure of Tibetan.We have found that,unlike other languages,the unique static morphological and structural relationship in Tibetan(that is,the suffix-to-functional continuation relationship)will seriously affect the semantic understanding of Tibetan sentences.Specifically,in addition to the information of the character itself,we also integrate the suffix information of the character,so that the character can be more accurately connected to the correct functional word.Therefore,considering the static morphological structure will correct some grammatical errors in the sentence,the sentence semantics can be accurately expressed.Secondly,a language model of the dynamic morphological structure of Tibetan is proposed.We found that there are some dynamic morphological and structural relationships in the corpus(i.e,morphological inflections in Tibetan).In Tibetan,morphological inflectional change words are special and very important.Certain words will have an important influence on the semantics of sentences,especially homophones in speech recognition.This may be the reason for prediction errors.The more expected word pairs in this category,the lower the probability of being replaced.Transformation,we transform the morphological verbs in Tibetan,the transformation can not only be assigned to a higher part of speech,predict the probability estimate,and Semantics can be more accurate.Finally,a Tibetan language model combining static and dynamic morphological structures is proposed.Our statistical corpus found that the static morphological structure relationship can refer to the problem of grammatical errors in the sentence,and the dynamic morphological structure can change the weight of the morphological verb in the sentence.This effectively integrates the static and dynamic morphological structure.Influence,and the morphological verbs have been increased,and the performance has been improved than considering the characteristics.To summarize,through the establishment and analysis of the Tibetan corpus,we have discovered some characteristics that have an important influence on the Tibetan language.The study of the effect of long suffixes on functional words and morphological verbs on sentences mapping Tibetan language model can effectively improve the recognition and understanding of Tibetan language.In addition to speech recognition,our work can be applied to the field of Tibetan natural language processing such as handwriting recognition,machine translation and syntactic analysis.We hope that through this work we will make contribution to the research of Tibetan information processing in the future research.
Keywords/Search Tags:Tibetan language model, Static morphological structure, Tibetan grammar, Dynamic morphological structure, Automatic speech recognition
PDF Full Text Request
Related items