Based on the implementation of the healthy China strategy and the rapid development of Internet technology,pretraining model,deep learning and other technologies,the traditional rule-based and machine learning methods need to consume a lot of human resources and constantly build new emotional dictionaries or complex features.Today,the rapid development of the Internet makes the scale of text data increasingly large,the form complex and changeable,and the expression methods change with each passing day,Therefore,the traditional way of sentimental analysis has gradually failed to keep up with the pace of the times.In view of these shortcomings of the traditional sentiment analysis model,this paper uses more advanced data enhancement,four-stage pretraining model and deep learning technology to scientifically and efficiently complete the sentiment analysis task of netizens during public health emergencies.The main research work and innovations of this study are as follows:(1)In view of the fact that other researchers simply use back translation to realize oversampling on the problem of data imbalance,this study enhances the training set to varying degrees,which not only makes the data between the three categories basically balanced,but also improves the generalization ability of the model,so as to improve the classification effect of the model.(2)In view of the problem that BERT(Bidirectional Encoder Representation from Transformers)or ERNIE(Enhanced Representation from k Nowledge Int Egration)trained on general corpus used by other researchers lack public health emergencies knowledge,this study first uses 910000 unsupervised data to further pretrain BERT and ERNIE in the field to obtain BERT2 and ERNIE2;Then use 100000 pieces of unsupervised data to further pretrain BERT2 and ERNIE2 in the task to obtain BERT3 and ERNIE3;Finally,100000 pieces of supervised data are used to fine tune BERT3 and ERNIE3.Compared with the traditional two-stage training method of fine tune,this four-stage training method makes the pretraining model learn more prior knowledge in the epidemic field,so as to improve the classification quality of the model.(3)In view of the limitation of single deep learning model classification,this study combines BERT3,ERNIE3 with Fully Connected Neural Networks,Convolutional Neural Networks,Bidirectional Long Short-Term Memory Networks,Recurrent Convolutional Neural Networks and Deep Pyramid Convolutional Neural Networks to realize 10 sentiment analysis models,Finally,the five models with better classification effect were hard voted to give full play to the advantages of each model,and finally achieved an accuracy of 89.64% and an F1 value of 0.8948.(4)In order to verify the actual effect of the sentiment analysis model proposed in this paper,this paper makes a horizontal comparison with the sentiment analysis model proposed by machine learning method,baseline and other researchers.The experimental results show that the sentiment analysis model proposed in this paper achieves better classification effect,it is 0.1648 and 0.0031 higher than the F1 values obtained by Word2vec-HAN and BERT-BLCNN-Att models. |