| The popularity of the mobile Internet has brought about an explosive growth model of user information,among which the emotional information of users’ public opinion is highly valued by enterprises and governments.For example,fans’ comments on movies on Douban,and users’ comments on hot events on Weibo.The mining of user emotions is good for the growing and operation of companies and governments.However,the real text has problems such as short text,serious colloquialism,chaotic grammar,many typos,and symbolic expression,which lead to the incompleteness of text data and bring great challenges to the semantic analysis of sentiment classification.For incomplete text data,the traditional processing methods include deletion method and filling method.The deletion method will cause waste of information,and the filling method will introduce uncertain information.Most of the existing machine learning algorithms are designed for pre-processed text,but the real incomplete text leads to the inapplicability of most existing algorithms.Although there are a few algorithms that can classify incomplete text directly,when the original text contains a large number of incomplete text issues,the classification performance will drop severely.For sentiment classification of incomplete texts,the concrete research work and contributions of this paper are as follows:1.According to the development sequence of pre-training models,the existing pre-training models are sorted out and summarized,and their shortcomings are pointed out.Aiming at the problem of insufficient feature expression ability of words in incomplete texts,this paper proposes a framework for sentiment classification of incomplete texts based on a pre-training model,which lays the foundation for subsequent research on several issues in sentiment classification of incomplete texts.2.Aiming at the problems of missing words and wrong words in incomplete texts,the semantic representation of texts is blurred.Based on the pre-trained model framework,in order to further enhance the classification performance of the model on incomplete texts,this paper proposes to combine the stack noise reduction autoencoder to further refine the feature representation of words,so as to extract deep features to reconstruct missing words and feature representations of wrong words.Theoretical analysis and experimental results show that the proposed method has improved classification performance compared with the current mainstream algorithms.3.Most of the existing expression texts are short texts with less than tens of words.Due to the short texts,it is difficult for the model to capture the semantic information of the context,and there are often problems such as colloquial expression,many typos,and frequent updates of new words on the Internet.The complex and diverse representations of short texts lead to poor results of traditional methods.Therefore,for short texts and incomplete short texts,this paper proposes a short text sentiment classification method incorporating pre-trained word vectors.Based on the pre-training model framework,different pre-trained word vectors are obtained,and then high-level feature vectors are extracted with the help of text convolutional neural network.Finally,the two vectors are spliced and fused to enhance text semantics,thereby improving the effect of short texts and incomplete short texts..Compared with a single pre-trained model on two popular short text sentiment datasets,the results show that the sentiment classification accuracy of the fusion pre-trained word vector method is improved. |