Font Size: a A A

The Research On Short Chinese Text Classification Technology Based On Deep Learning

Posted on:2021-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y WangFull Text:PDF
GTID:2518306047485454Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
In recent years,more and more people like to publish and obtain information on the Internet through various mobile phone applications,and this information mainly exists in the form of Chinese short texts.The Chinese short text mainly involves two categories.One is about description information,usually covering multiple topics,such as a Zhihu question,which may be about "economy","culture" and "travel".In order to improve user experience in the fields of classification display and information retrieval by intelligent technology,it has great significance to classify these Chinese short texts with multiple tags.The other category has subjective emotions and opinions,such as Taobao commodity evaluation,social hot topic discussion,etc.The emotional polarity classification of these Chinese short texts collects the user feedback of the products and merchants.It also has great significance for the government departments to understand the people’s situation and macro-control the online public opinions.The Chinese short text is characterized by short length,sparse features,varied themes and colloquial expressions.The conventional text classification algorithm based on artificial feature analysis cannot be directly applied to it,but there are relatively few studies specifically focus on Chinese short text classification.In recent years,the features of deep learning technology that automatically learn and extract high-dimensional features have attracted many scholars to apply it to the field of text analysis.Therefore,this thesis focuses on the deep learning-based Chinese short text classification technology,the main work is as follows.First,this thesis surveys and summarizes the current research status of text classification technology at home and abroad,and introduces the main technologies involved in the text classification process based on deep learning.In the text representation module,we mainly study the classic Word2 Vec model and the newly emerged BERT model that can solve the problem of "single word polysemy".Feature extraction module,this thesis studied the classic CNN and RNN series models.Combined with the needs of the scenario,these models are applied to the scheme proposed in this thesis.Aiming at the problem of multi-label classification of Chinese short texts,a multi-label classification scheme of Chinese short texts based on deep learning is proposed.Specifically,the scheme use BERT and Word2 Vec for different granular word vector representations of text.The word vector features represented by Word2 Vec were aggregated into sentence vectors by Attention and CNN models.BERT directly processed sentence vectors with tools,and then splicing two sentence vectors as global features to carry out multi-label classification.Experimental results show that the proposed scheme has better performance than the traditional algorithm in the multi-label text of Zhihu.Aiming at the classification of affective polarity in the Chinese short text,a deep learning based affective polarity classification scheme is proposed.Specifically,BERT text is used for word vector representation,the results are input into Bi GRU algorithm to extract global semantic information,and then the Attention mechanism is used to extract the main emotion words,finally the output of Attention mechanism is input into the classifier to judge the polarity of emotion.Experimental results show that the scheme has good performance in each evaluation index.
Keywords/Search Tags:Deep Learning, Chinese Short Text, Multi-label Classification, Sentiment Polarity Classification
PDF Full Text Request
Related items