| With the rapid development and technological progress of the Internet,the existing forms of data are diverse.Text data,as one of the most common forms of data,contains the information that people transmit and receive.In daily life,short text has a variety of forms and is widely used,often appearing in scenes such as title classification and public opinion analysis.Short text classification is one of the most popular tasks in Natural Language Processing.As one of the typical short text forms,short captioned text is characterized by its short length,but it is usually a summary of the full text.This kind of short captioned text data classification can help people obtain short text information efficiently and quickly understand the text content,so as to realize the reprocessing of information.However,due to the small vocabulary of short text,it is difficult to obtain a better result by applying the traditional long text processing method directly to the highly sparse short text feature vectors.In order to improve the effect of short text classification,this paper expands the feature of short text data of titled type according to the difficulty of small amount of information.In the process of feature extension,considering that LDA word expansion in the existing literature is insensitive to category information and inefficient due to topic dispersion,this paper presents a short text expansion technique based on subject model and knowledge graph,and fuses word vector model for short text classification.This method first trains LDA model and FastText model using external corpus.Then it builds the core topic set containing category information based on the trained LDA model,so as to get topical extension words at the topical level.At the same time,based on the knowledge graph,innovatively uses the modified TF-IDF and modified chisquare statistics to filter twice,and builds the core concept word set containing category information,so as to obtain conceptual extension words at the conceptual level.These extended words contain topical information and conceptual information,which can enrich the information of short text to a certain extent and have strong category resolution.Then,the expanded text is represented by the word vector model,and the classification model is established and evaluated.Experiments show that the proposed method achieves a good classification result on title-type short text categorization tasks.This shows that the word expansion method based on LDA model and knowledge graph adds more useful information to short text to help the classification of short text. |