Research On Some Key Techniques Of Uyghur Network Public Opinion Analysis

Posted on:2020-10-01

Degree:Doctor

Type:Dissertation

Country:China

Candidate:M T A Y F Mai

Full Text:PDF

GTID:1525305885950449

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet technology,the network has become one of the main channels of information dissemination.In Xinjiang,online media such as news websites,forums,blogs,microblogs and We Chat have become the main platforms for people to express their views,wishes,emotions,appeals and attitudes.This subjective information not only reflects the attitudes and appeals of netizens to the society,but also affects the development trend of public opinion.Information posted by netizens usually has a positive or negative tendency,and negative information can have a negative impact on society,therefore,how to use technical means to obtain useful public opinion information from massive data has important theoretical significance or use value.In this paper,the sentiment analysis,topic model,topic detection and named entity recognition technology in network public opinion analysis is deeply studied,and the analysis methods with higher efficiency than traditional methods are proposed.The effectiveness of the method is verified by using different evaluation indicators.The main work of this paper includes the following aspects:(1)Aiming at the lack of corpus resources in the Uyghur sentiment classification task and the lack of multi-domain common sentiment classification methods,a hybrid model of deep learning with strong generalization ability is constructed on the basis of integrating multiple features and introducing attention mechanism in combination with Uyghur language characteristics.Firstly,the part-of-speech feature,syllable feature and position feature vector are used as the complement of the word vector,and the mixed vectors are generated by vector concatenation method.Secondly,the mixed vector is used as the input of Bi LSTM network to obtain the historical context-related information of the text.Thirdly,the attention mechanism is used to focus on the vocabulary with emotional information in the text,and the CNN network is used to obtain the local information with sentiment tendencies in the text.Finally,the softmax function is used to obtain the sentiment classification results.The experimental results show that the proposed hybrid model is superior to the simple deep learning method and the traditional machine learning method on the Uyghur sentiment two-category and five-category tasks.The validity of the proposed model is verified.(2)Aiming at the problem that the traditional LDA probabilistic topic model only considers the co-current rate between words in text,but ignores its semantic relation,this paper proposes a W2 VJLDA model based on the combination of word vectors and an improved LDA topic probabilistic model.Firstly,the model uses the traditional LDA probability topic model to obtain the topic-word distribution.Secondly,the externally trained word vectors are added to the topic-word model to train the word vectors and topic vectors cooperatively.Thirdly,by re-defining the conditional probability distribution function of the topic-word word vector and topic vector and trying to minimize the KL value between the new topic-word and the initial topic-word distribution function,generate the word vector and the topic model.The validity of the word vector and topic vector generated by the W2 VJLDA model and the topic-word distribution model proposed in this paper are verified on different evaluation indicators.(3)Based on the concept of topic seed,an improved Single-Pass topic clustering algorithm is proposed,which retains the main idea of the original algorithm.The W2VJLDA-SP topic detection model is constructed by using the multiple features of word vector,topic vector generated by W2 VJLDA model and named entity feature vector as the input of the improved Single-Pass topic clustering algorithm.Experimental results show that compared with the traditional Single-Pass algorithm,the W2VJLDA-SP model proposed in this paper can effectively reduce the false detection rate and missed detection rate of topic detection on the same experimental data set.This shows that W2VJLDA-SP topic identification model can improve the quality of topic clustering,has high practicability,and has high application value in network public opinion analysis.(4)For Uyghur named entity recognition task,this paper proposes a hybrid neural network model based on Bi GRU-CNN-CRF framework,which is better than the dictionary method and traditional machine learning method in the UYNERDATA corpus provided by the laboratory.The validity of the Bi GRU-CNN-CRF framework for Uyghur named entity recognition is confirmed.At present,the model has been applied in the Uyghur network public opinion analysis system,which can effectively identify the names of people,places and institutions in the text,thus ensuring the analysis ability of the public opinion system.(5)Based on the in-depth study of key technologies such as text sentiment analysis,topic model,topic detection method and named entity recognition,the Uyghur network public opinion analysis system is implemented.

Keywords/Search Tags:

public opinion analysis, sentiment classification, topic model, topic detection, named entity recognition

PDF Full Text Request

Related items

1	Cambodian Named Entity Recognition Based On The Topic Model Word Vector
2	Research On The Analysis Model Of English Composition Topic Opinion
3	Literary Named Entity Recognition And Its Application
4	Recognition Of Uyghur Musical Named Entity Based On CRF
5	Research On Chinese-Vietnamese Entity Alignment Technology Based On Named Entity Recognition
6	Named Entity Recognition For The Field Of Ancient Chinese
7	Research On Named Entity Recognition Based On Ancient Book Corpus
8	A Named Entity Recognition Method For Text Of Han Dynasty Paintings
9	Across The English And Chinese Research Topic Detection And Tracking Technology
10	Research And Application Of Korean Named Entity Recognition Method Based On Multi-Granularity Fusion