Font Size: a A A

Classification Of Tibetan Problems For "Campus Allknowing"

Posted on:2020-10-19Degree:MasterType:Thesis
Country:ChinaCandidate:L P SunFull Text:PDF
GTID:2415330572993888Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The current question and answer system has become a research hotspot of universities and research institutions,and the classification of research questions is the premise of doing a question and answer system.Nowadays,the research on the classification of Chinese problems has matured,and there are few studies on the classification of Tibetan problems.This paper selects the specific field of Northwest University for Nationalities to study the problem classification in the Tibetan language problem analysis module in the national college question and answer system.This paper first analyzes the difference between Tibetan question and ordinary text and the characteristics of Tibetan question,and then classifies the existing corpus according to the characteristics of the problem set.Because the corpus used in this paper is a set of Tibetan campus problems from the Northwest University for Nationalities,considering that the corpus of this paper is small,and the questions are short and the features are few.If the classification is too fine,the features are not recognized,and The degree of discrimination between classes is reduced.The article divides all issues into four categories.These four categories are the school profile,education and teaching,the culture of the people's university and the service guarantee.After the corpus is finished,the corpus is preprocessed.The article uses the Tibetan word segmentation system of the teacher of Northwest University for Nationalities to carry out word segmentation.Secondly,in terms of the problem text representation,the word vector representation method is selected in this paper,and the skip-gram model in word2 vec technology is used to transform the problem text into a low-dimensional and dense word vector.This method can not only solve the dimension disaster caused by dimensional sparsity,but also measure the similarity between words.After the problem text is transformed into a word vector expression,each question is input into the convolutional neural network model in the form of a twodimensional matrix.According to the characteristics and size of the problem set,the convolutional neural network model structure is designed as an input layer,a convolution layer,a pooling layer and a fully connected layer.In the CNN model,the convolutional layer and the pooling layer are used to extract the question features,and finally the softmax classifier is used to complete the problem classification.In order to prove the effect of convolutional neural networks on the classification of Tibetan problems,the article compares them with naive Bayes and KNN classificationmethods in machine learning.The experimental results show that the classification effect of the convolutional neural network model is better than that of machine learning,and it has a good effect on the classification of Tibetan problems.
Keywords/Search Tags:Tibetan, Problem classification, Feature selection, CNN
PDF Full Text Request
Related items