| In recent years,due to the support of policy,funding for Xinjiang communication industry by the government,Uyghur websites and communication platforms developed rapidly,and resulting in a large scale of Uyghur text data with emotional tendency.Facing to the huge amount of massive data,obviously,it is too hard to fully comprehension and effectively analyzing sentiment information totally depend on manual approach.Therefore,it would be indispensable for implementing Uyghur sentiment analysis system via computer techniques.Text sentiment classification is one of the core tasks in text sentiment analysis,as well as a classification method that classifies the text which brings sentiment tendencies according to sentiment inclination.In this work,we regard that due to lack of human-labeled resources,lack of feature engineering research,hard to get labeled resources and lack of comparison between different classification methods,we investigated specialized research as shown below:For addressing the current lack of Uyghur sentiment tagging resources,we have created a Uyghur sentiment dictionary(UYSenti Dict),which includes more than12000 vocabularies via a semi-supervised method.Besides,also generated both Uyghur comment tagging sentiment corpus and literature sentiment tagging corpus based on already existed two corpora that movie subtitle corpus and Weibo corpus,which the corpus size of we have prepared is 9000 and 600 sentences,respectively.Since lack of the systematic research about feature representation of Uyghur sentiment classifications,we make evaluations for the effect of different features to Uyghur sentiment classification by leveraging aforementioned four sentiment corpora via extracting novel features and combined features based on N-gram traditional model.Meanwhile,make the comparison of the classification ability of different features by taking advantage of 4 feature selection methods,5 feature weighting methods and 2 machine learning classifier.Proposed the optimal features,feature selection and weighting methods and the optimal machine learning classifiers for Uyghur text sentiment classification.Considering the difficulties for achieving tagged samples,this paper improves the existing lexicon based classification method and combines it with the machine learning classifier.Firstly,classify the Uyghur sentiment corpus take using of UYSenti Dict,in the matching process of emotional vocabulary,the object is extended from the word prototype to the stem,and the influence of the language grammar rules on the emotional tendency of the sentences is fully considered.Then the machine learning classifier trained on the pseudo-annotated data sets that selected from the results of lexicon based classifier,and the remaining corpora are classified by extracting some optimal features.Thanks to the proposed method not be constrained by domain,also not need to use pre-tagged training data,therefore our approach can deal with the resource scarcity problem of Uyghur sentiment classification.Owing to the Uyghur sentiment classification belonged to the budding step,it is needed to proving the comparison of the effectiveness of existing various classifiers.In this work,we downloaded big corpus size of unlabeled data,and train Uyghur word embedding,take this embedding as features to make binary classification via using deep learning model(CNN,LSTM,CNN+LSTM).During training step,we fine-tuned the model parameters and selected the best model and confirm optimal parameters of our model.Then we make a comparison between the effectiveness of traditional machine learning methods and deep learning approachs on Uyghur text sentiment classification.In this work,in order to obtain the reasonable Uyghur sentiment classification approach,firstly we make systematically comparison and evaluation all of the existed classification methods.We mitigate the dependencies of tagged corpora leveraging combinations of considering all the aspects of unified technique and the own special features of Uyghur language.The result of our research not only accelerate the Uyghur text sentiment classification but can also be used Kazak,Kirghiz sentiment classification work. |