Font Size: a A A

Research On Word Segmentation And Part-of-speech Of Tibetan On Neural Network

Posted on:2021-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:K WangFull Text:PDF
GTID:2415330611452117Subject:Engineering·Software Engineering
Abstract/Summary:PDF Full Text Request
The development of Tibetan information technology has played an essential role in scientific research,education,and even the daily life of Tibetan people.After years of progress in information technology,linguistic analysis based on Tibetan information technology has also obtained many decent results.As the basis and the critical points of Tibetan IT linguistic research,word segmentation and part-of-speech tagging have direct impacts on several Tibetan natural language processing tasks such as syntax analysis,text classification,and sentiment.However,while the use of traditional statistical models in Tibetan word segmentation and part-of-speech tagging can achieve specific goals,the process of artificial construction of features in these statistical models may damage the universality in large-scale datasets.Due to the development of the neural network,it has become a popular trend to apply neural networks in traditional natural language processing.Neural network algorithms have extraordinary advantages in automatically extracting sequence features,training,and fitting large-scale data.They can take full advantage of GPU and other hardware because of their capabilities of storing information distributionally,processing parallelly,and self-learning.Therefore,based on neural network algorithms,this dissertation focused on the problems of Tibetan word segmentation and part-of-speech tagging and completed the following tasks:Firstly,the Convolutional Neural Networks(CNN)and Iterated Dilated CNN(IDCNN)algorithms were introduced in the bidirectional long-term and short-term memory network model to segment the Tibetan language and achieved decent results.By comparing different algorithm frameworks,we identified that the IDCNN algorithm could extract more integrated sequence feature information than the CNN algorithm,which improved the performance of segmentation in the model.Secondly,regarding the task of part-of-speech tagging,training,and testing of Tibetan vocabulary datasets were conducted according to the 91-class fine classification criteria of the "Tibetan Part-of-Speech Tagging Set for Information Processing." Also,the performance of neural network algorithms was tested in the task of part-of-speech tagging.Thirdly,faced with the problem of error accumulation caused by asynchronous word segmentation and part-of-speech tagging,an integrated Tibetan word segmentation tagging framework based on neural network algorithms is designed and implemented.The experimental results showed that the integrated framework can effectively improve the correctness of the Tibetan word segmentation and part-of-speech tagging in natural language processing.All in all,the Tibetan word segmentation and part of speech tagging model based on neural network algorithm can automatically extract sequence features,so as to effectively achieve Tibetan word segmentation and part of speech tagging.This study will have a positive impact on the realization of subsequent Tibetan processing tasks.
Keywords/Search Tags:Tibetan, word segmentation, part-of-speech tagging, neural network, integration
PDF Full Text Request
Related items