| Classification of news text has always been a hot topic in the field of natural language processing.In view of the complexity of COVID-19 news and the unclear information category,this dissartation analyzes website and webpage information to obtain COVID-19 related news and create COVID-19 news corpus,aiming to study machine learning,in-depth learning methods and fusion models based on COVID-19 news information,and explore the classification performance of machine learning,in-depth learning and text fusion models based on COVID-19 dataset,And conduct subsequent classification of news text evaluation and prediction.This study is mainly divided into three parts:(1)Summarize the data preprocessing methods,and preprocess the news information of COVID-19 after data screening and cleaning.According to the characteristics of the news of COVID-19 epidemic and the classification of the news field on the Internet,sort out the classification basis and create a dataset of COVID-19 epidemic.This paper studied the classification performance of machine learning methods in COVID-19 news dataset,explored the importance of TF-IDF and Word2 vec text representation methods in text classification,evaluated the classification performance of four machine learning classifiers,namely,support vector machine,naive Bayes,logical regression and random forest,and concluded that the combination of TF-IDF and support vector machine in COVID-19 news dataset classification accuracy reached 84%.(2)Because the deep learning method performs best in many classification models,research the classification performance of a single deep learning method based on the COVID-19 news dataset.This part verifies the importance of parameters to the deep learning model by adjusting the model parameter values for parameter comparison experiments,and selects the optimal parameter settings to study the ability of three deep learning models represented by Text CNN,Bi LSTM,and Bi GRU in obtaining information integrity,The model comparison experiment shows that the classification accuracy of Bi GRU based on the COVID-19 epidemic data set has reached 86%,which is 3% and 1% higher than Text CNN and Bi LSTM,respectively,and verifies the outstanding performance of the deep learning method in the field of text classification.(3)Due to the fact that the text features obtained by Word2 vec are static word vectors,which cannot solve the problem of polysemy in text,and that a single machine learning and deep learning model cannot take into account the integrity of semantic information in classification.In this paper,we extract contextual semantic features through the deep bidirectional representation of BERT,use Text CNN to obtain local features of text,and use Bi GRU to extract the characteristics of order and context dependencies between words,The fusion model of Text CNN and Bi GRU based on BERT was constructed,and the experimental results showed that the average classification accuracy of the fusion model based on the COVID-19 data set reached 87.8%,1.8%higher than the Bi GRU that performs best in a single deep learning method,and 1.6%and 3.7% higher than other BERT based models BERT Text CNN and BERT Bi GRU,respectively,which verified the advantages of the fusion model in text classification in this paper. |