| With the impact of the era of big data and the vigorous development of the Internet and cloud computing technology,as well as the explosive growth of smart users,many forms of short text have been emerged on the network,such as news headlines,chat records and Weibo comments.The short text contains a large amount of valuable information,which can provide great value of research and application for academe and industry,so as to bring more convenience for people’s learning and life.However,different from the traditional text,short text has the characteristics of text irregularity,sparse features and low amount of information,so it can not provide sufficient contextual information,and the traditional machine learning methods convert short text into vectors simply with high dimension and high sparseness when classifying,which makes the feature extraction of short text more difficult,resulting in poor classification.In addition,the scale of high-quality short text datasets available for training is limited,the model is prone to overfitting,so it is urgent to find a method with better performance and higher classification accuracy.Some scholars have applied deep learning to the field of natural language processing,and made great breakthroughs,among which,the most representative model is convolutional neural network,it performs well in classification tasks.Therefore,this paper combines convolutional neural networks to research short text classification,and the main work is as follows:(1)Aiming at the problems that ambiguous topics and difficult to improve classification accuracy,are caused by containing less information and sparse features,this paper proposes a short text classification method based on convolutional neural network and knowledge graph.Firstly,extract words that can represent short text through TF-IDF,forming a keyword sequence.Then perform entity linking between keyword sequence and knowledge graph to obtain entity concept set,adding contextual background to the short text and expanding the text characteristics,finally the character vector of the short text and the entity concept are entered into the convolutional neural network together,so as to achieve the short text classificaiton effectively.Experimental results show that the proposed method is better than the traditional model on the five short text datasets.(2)Aiming at the problem that it is easy to overfit in the case of small samples when classifying short text,this paper proposes a short text classification method based on convolutional neural network and improved data augmentation.This method enhances and expands the training corpus from the word-level and sentence-level,and uses convolutional neural network as a classification model to extract corpus features.The word-level method is synonym replace,the cosine distance between the word and each word in the corresponding thesaurus is calculated through the word vector after the text is segmented,and select the similar words that are close to each other to replace;the sentence-level method is back-translation,increasing the back-translation times and calculating the cosine similarity to select the similar text.Experimental results show that the proposed method expands the corpus effectively and performs very well in the short text classification task.This paper proposes two different short text classification methods,which solve the problems that sparse features of short text and model overfitting.Through comparative experiments,it is verified that the two methods proposed in this paper are effective and feasible.The paper has 18 figures,7 tables,and 55 references. |