Font Size: a A A

Research On Some Key Techniques Of Uyghur Network Public Opinion Analysis

Posted on:2020-10-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:M T A Y F MaiFull Text:PDF
GTID:1525305885950449Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,the network has become one of the main channels of information dissemination.In Xinjiang,online media such as news websites,forums,blogs,microblogs and We Chat have become the main platforms for people to express their views,wishes,emotions,appeals and attitudes.This subjective information not only reflects the attitudes and appeals of netizens to the society,but also affects the development trend of public opinion.Information posted by netizens usually has a positive or negative tendency,and negative information can have a negative impact on society,therefore,how to use technical means to obtain useful public opinion information from massive data has important theoretical significance or use value.In this paper,the sentiment analysis,topic model,topic detection and named entity recognition technology in network public opinion analysis is deeply studied,and the analysis methods with higher efficiency than traditional methods are proposed.The effectiveness of the method is verified by using different evaluation indicators.The main work of this paper includes the following aspects:(1)Aiming at the lack of corpus resources in the Uyghur sentiment classification task and the lack of multi-domain common sentiment classification methods,a hybrid model of deep learning with strong generalization ability is constructed on the basis of integrating multiple features and introducing attention mechanism in combination with Uyghur language characteristics.Firstly,the part-of-speech feature,syllable feature and position feature vector are used as the complement of the word vector,and the mixed vectors are generated by vector concatenation method.Secondly,the mixed vector is used as the input of Bi LSTM network to obtain the historical context-related information of the text.Thirdly,the attention mechanism is used to focus on the vocabulary with emotional information in the text,and the CNN network is used to obtain the local information with sentiment tendencies in the text.Finally,the softmax function is used to obtain the sentiment classification results.The experimental results show that the proposed hybrid model is superior to the simple deep learning method and the traditional machine learning method on the Uyghur sentiment two-category and five-category tasks.The validity of the proposed model is verified.(2)Aiming at the problem that the traditional LDA probabilistic topic model only considers the co-current rate between words in text,but ignores its semantic relation,this paper proposes a W2 VJLDA model based on the combination of word vectors and an improved LDA topic probabilistic model.Firstly,the model uses the traditional LDA probability topic model to obtain the topic-word distribution.Secondly,the externally trained word vectors are added to the topic-word model to train the word vectors and topic vectors cooperatively.Thirdly,by re-defining the conditional probability distribution function of the topic-word word vector and topic vector and trying to minimize the KL value between the new topic-word and the initial topic-word distribution function,generate the word vector and the topic model.The validity of the word vector and topic vector generated by the W2 VJLDA model and the topic-word distribution model proposed in this paper are verified on different evaluation indicators.(3)Based on the concept of topic seed,an improved Single-Pass topic clustering algorithm is proposed,which retains the main idea of the original algorithm.The W2VJLDA-SP topic detection model is constructed by using the multiple features of word vector,topic vector generated by W2 VJLDA model and named entity feature vector as the input of the improved Single-Pass topic clustering algorithm.Experimental results show that compared with the traditional Single-Pass algorithm,the W2VJLDA-SP model proposed in this paper can effectively reduce the false detection rate and missed detection rate of topic detection on the same experimental data set.This shows that W2VJLDA-SP topic identification model can improve the quality of topic clustering,has high practicability,and has high application value in network public opinion analysis.(4)For Uyghur named entity recognition task,this paper proposes a hybrid neural network model based on Bi GRU-CNN-CRF framework,which is better than the dictionary method and traditional machine learning method in the UYNERDATA corpus provided by the laboratory.The validity of the Bi GRU-CNN-CRF framework for Uyghur named entity recognition is confirmed.At present,the model has been applied in the Uyghur network public opinion analysis system,which can effectively identify the names of people,places and institutions in the text,thus ensuring the analysis ability of the public opinion system.(5)Based on the in-depth study of key technologies such as text sentiment analysis,topic model,topic detection method and named entity recognition,the Uyghur network public opinion analysis system is implemented.
Keywords/Search Tags:public opinion analysis, sentiment classification, topic model, topic detection, named entity recognition
PDF Full Text Request
Related items