Font Size: a A A

Short Text Classification Algorithm Of Deep-learning Based On Feature Extension

Posted on:2019-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:H X ChenFull Text:PDF
GTID:2428330572450932Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Text classification technology is a kind of effective ways used to manage a large number of text messages and it has achieved good development up now.In current years,along with the rapid development of the Internet and computer technologies,social platforms such as Weibo and Wechat have gradually emerged,Correspondingly,short-text content has become the main way for people to communicate and acquire daily information.The data in short texts grows explosively,handling short texts like Weibo updates,comments to mine the value of the data behind has very realistic significance and application value for among users ?business and even government and scientific research personnel,short text classification technology is one of the research directions.Short text has the characteristics of the characteristics of sparse and lacking the ability to communicate information,so it is impractical to use of traditional text classification algorithm directly.In recent years,although there are many research aiming at the short text classification algorithm and Weibo platform also have classification function,under the condition of rapid development of science and technology,it is the trend of the era for Weibo short texts to be explosive growing,so the research about it is not out of date,Besides,there is still room to improve for the short text classification technology.This paper is in view of the high sparse and the poor ability to express the characteristics of the information of short text to extend the text features of short text first and then do the classification task.Then introduceing the deep learning to the classification task and using convolution neural network as classifier.This paper study the classification process of short text in detail,including the crawler,text preprocessing,expansion of Chinese word segmentation technology,characteristics and selection,the classifier training and classification and other steps.Using improved Apriori alglorithm to scan the data set to get association rules,then extend the characteristics due to the association rules.In order to ensure that the characteristics association rules can cover to the sample data in various categories,this paper respectively calculates characteristics association rules of every kind of category.For the high frequency words without influence and noise in the short text,using “stop words”to filter in the process of text preprocessing.In the classification stage of short text,through the experimental analysis,classifier is designed to a kind of structure of convolution neural network make up by 3 smaller size of convolution kernels stack,which can not only ensure the classification performance of the network also restore features and characterization better.During the training iteration of the classifier,the weights of each layer will be continuously updated according to the error term calculation until the classifier training is completed.Experiments show that comparing with traditional machine learning methods such as Support Vector Machine,Bayesian network,the decision tree algorithm and so on,the feature-expanded convolutional neural network classification model proposed in this paper successfully improves the classification accuracy of the short text of Weibo.Besides,comparing to a kind of method which also study sina weibo,it also shows obvious advantages.
Keywords/Search Tags:short text classification, feature extension, association rules, deep learning, Convolutional neural network
PDF Full Text Request
Related items