| With the continuous development of technology,the number of text messages on the Internet has increased dramatically,and the methods have become more diverse.In order to understand the preferences of the public and the direction of public opinion,businesses,governments,and social organizations urgently need to classify the text content emotionally.Text sentiment classification is to divide text data into two categories,positive and negative,through the computer,to judge people’s emotional tendency through the category,and then to understand people’s views on something.Considering the relatively large group of Cantonese speakers around the world,it is of great significance to understand the emotional tendencies of the people in Cantonese.At present,the sentiment classification of Chinese simplified text and English text has relatively mature technology and method,but there is no research result on sentiment classification of Cantonese text,so it is necessary to study the sentiment tendency of Cantonese text.The composition of the Cantonese text is relatively complex,and the types of emotions are variable.It is interfered by traditional characters,variant characters,Cantonese-specific mood words,and Cantonese word order grammar,etc.,which causes the common methods to have certain limitations,which cannot be achieved.It is good to classify sentiment in Cantonese text,so this article focuses on the study of sentiment classification method in Cantonese text.In this study,we study the Cantonese text sentiment classification model based on support vector machine(SVM)in traditional machine learning methods.Chi-square test is used to extract word features from the text.We hope to obtain a model that can effectively classify Cantonese text through the training dataset However,from the final experimental results,it can be seen that the classification effect of this method is not ideal.This article analyzes the specific reasons in detail,and proposes an improvement plan,using deep learning methods to solve this task.This paper studies the Text CNN model and solves the Cantonese text sentiment classification problem through the Text CNN model.Although the experimental results are significantly improved compared to the traditional machine learning methods,they still fail to achieve the desired results.This study summarizes the advantages and disadvantages of this method,and intends to use the pre-trained BERT model to complete the Cantonese text sentiment classification task.This article has an in-depth understanding of the input pre-processing of the BERT model and the two-stage training method consisting of pre-training and Fine-Tuning mode;the Transformer architecture of the BERT model is studied;and the core codec part of the Transformer architecture is adopted The self-attention mechanism of the bulls has been deeply explored.In order to optimize the sentiment classification effect of Cantonese text,this study improves the BERT model.Firstly,in order to solve the problem of unregistered words in Cantonese in BERT-2,a parameter adaptation method of Cantonese text pre-training model based on dictionary changes was designed;second,a Cantonese text representation based on unified pre-training model was designed for a specific field Construction method,constructing a Cantonese text classification method based on the three-stage pre-trained BERT model(BERT-3),completing the Cantonese text sentiment classification task through two transfer learnings;finally,the downstream task input of the BERT Cantonese text based on the auxiliary sentence pair structure is proposed Representation method,by constructing auxiliary sentences to convert sentiment classification tasks into binary sentence pair tasks,thereby solving the problem of limited training data and task perception in Cantonese text.In the experimental stage,this study conducted a comparative analysis of the six Cantonese text sentiment classification models designed using Cantonese text datasets in the three fields of music reviews,movie reviews,and food reviews.The results show that the improved three-stage pre-trained BERT model(BERT-3)Compared with the BERT-3-AA and BERT-3-APA Cantonese text classification models based on auxiliary sentence pair structure,the evaluation index of classification results is improved to a certain extent compared to the original two-stage BERT model(BERT-2).The accuracy rate of BERT-3-AA and BERT-3-APA is up to 91.3%,the F value is up to 91.0%,the accuracy rate of BERT-3 is up to 89.6%,and the F value is up to 89.0%;BERT-3-AA and BERT-3-Based on the BERT-3 model,the accuracy of the APA model is increased to about 1.7%,and the F value is increased to about 2.0%.In addition,BERT-3-AA and BERT-3-APA have different advantages and disadvantages on different data sets. |