| At present,digital economy has become a key element of social development.The development of the digital economy is dependent on the information on the network platform of communication and sharing,with the rapid development of modern information network,produced a large number of valuable information,including text information data quantity,always occupy the core position,how to effectively will classify the massive text data to dig out the valuable information,in a larger extent,can solve the information,it is to have A topic of great significance.Traditional text representation methods can not accurately represent text semantics because of the phenomenon of word meaning gap.In recent years,the introduction of deep learning has further improved the effect and performance of text classification.With the increase of text data,the traditional classification technology can not meet the demand,big data computing engine provides an effective solution for mass data processing and analysis.This paper will study the relevant technologies of Chinese text classification based on deep learning.The main work completed is as follows:(1)Aiming at the problem that word vector representation in Chinese text classification task based on deep learning cannot make full use of knowledge information,a text classification method based on knowledge enhanced semantic representation model is proposed.First,the ERNIE pretrained language model is integrated with external knowledge graph to obtain more fully semantic distributed text representation.Then deep convolutional neural network is introduced to further extract the encoding features of the context to obtain a deeper level of text feature expression.The results show that the proposed ERNIE DPCNN algorithm is more suitable for short text classification than the BERT-based deep learning text classification model in terms of accuracy,recall rate and F1value.(2)As the volume of text data increases,the traditional classification technology can no longer meet the demand.Spark is a computing framework for massive data.This paper takes massive Chinese text data as the research object and conducts a research on Chinese text classification based on deep learning on the Spark platform.The Saprk NLP pipeline was constructed based on BERT pre-training model of deep learning,and a logistic regression algorithm optimized by L-BFGS was introduced to achieve text classification.Comparative experiments on multiple data sets show that the proposed Bert-LR model is feasible for text classification,and it improves several evaluation indexes to a certain extent.Moreover,the analysis shows that this model is more suitable for large data sets with balanced classification data. |