Research Of Chinese Text Classification Based On Deep Learning

Posted on:2022-06-03

Degree:Master

Type:Thesis

Country:China

Candidate:Y S Bi

Full Text:PDF

GTID:2518306734487594

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

At present,digital economy has become a key element of social development.The development of the digital economy is dependent on the information on the network platform of communication and sharing,with the rapid development of modern information network,produced a large number of valuable information,including text information data quantity,always occupy the core position,how to effectively will classify the massive text data to dig out the valuable information,in a larger extent,can solve the information,it is to have A topic of great significance.Traditional text representation methods can not accurately represent text semantics because of the phenomenon of word meaning gap.In recent years,the introduction of deep learning has further improved the effect and performance of text classification.With the increase of text data,the traditional classification technology can not meet the demand,big data computing engine provides an effective solution for mass data processing and analysis.This paper will study the relevant technologies of Chinese text classification based on deep learning.The main work completed is as follows:（1）Aiming at the problem that word vector representation in Chinese text classification task based on deep learning cannot make full use of knowledge information,a text classification method based on knowledge enhanced semantic representation model is proposed.First,the ERNIE pretrained language model is integrated with external knowledge graph to obtain more fully semantic distributed text representation.Then deep convolutional neural network is introduced to further extract the encoding features of the context to obtain a deeper level of text feature expression.The results show that the proposed ERNIE DPCNN algorithm is more suitable for short text classification than the BERT-based deep learning text classification model in terms of accuracy,recall rate and F₁value.（2）As the volume of text data increases,the traditional classification technology can no longer meet the demand.Spark is a computing framework for massive data.This paper takes massive Chinese text data as the research object and conducts a research on Chinese text classification based on deep learning on the Spark platform.The Saprk NLP pipeline was constructed based on BERT pre-training model of deep learning,and a logistic regression algorithm optimized by L-BFGS was introduced to achieve text classification.Comparative experiments on multiple data sets show that the proposed Bert-LR model is feasible for text classification,and it improves several evaluation indexes to a certain extent.Moreover,the analysis shows that this model is more suitable for large data sets with balanced classification data.

Keywords/Search Tags:

natural language processing, deep learning, Chinese text classification, Spark

PDF Full Text Request

Related items

1	Intelligent Device Text Classification Method Based On Natural Language Processing
2	Research On Adversarial Examples For Chinese Text Classification Models
3	Research On Text Classification Based On Deep Neural Network
4	Research On Text Classification Based On Natural Language Processing And Machine Learning
5	Research On Deep Learning Methods For Text Classification Tasks
6	Research And Application Of Text Classification Based On Deep Learning
7	Research And Analysis Of Text Classification Theory Based On Deep Learning
8	Research And Application Of Text Classification Based On Natural Language Processing
9	Research On Text Classification Algorithm Based On Deep Learning Method
10	Research On Financial Text Classification Method Based On Deep Learning