Font Size: a A A

Research On Short Text Classification Method Based On Text Graph Structure

Posted on:2022-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:X J ZhaoFull Text:PDF
GTID:2568306488981219Subject:Engineering
Abstract/Summary:PDF Full Text Request
Short text classification has become an increasingly important research topic,which aims to assign labels or tags to textual units with no more than 160 characters for regular discovery and automatic classification.At the same time,the effect of short text classification methods needs to be improved due to the sparsity,irregularity and lack of context.Short text classification methods based on deep learning have become popular.Their core is to automatically learn the multi-level semantic features of words,sentences,documents from the context in an end-to-end manner using deep learning methods.However,the structure still has a few problems that need to be solved:(1)The semantic learning of text only makes use of the word’s context and word co-occurrence in the text.The dependence of long-distance words under the same topic and the association of different texts by sharing potential topic are ignored,which affects the high-level semantic learning of short texts.(2)The graph structure which only contains word and text nodes can not express the key semantic information of short texts accurately.And the extraction mechanisms of all levels of semantic association need to be improved.In view of the above problems,the short text classification method based on text graph structure is proposed in this paper.The core of the method is to construct heterogeneous network to learn different levels of high-order semantics by introducing topics,entities,and to design the hierarchical attention mechanism to capture key-level semantics and its relevance.The specific research work is as follows:The existing short text classification methods ignore long-distance words’ semantic relevance and potential topic sharing under the same topic.To solve this issue,the short text classification method based on word-topic-document heterogeneous graph structure is proposed.Firstly,the contextual semantic vectors of words are yielded through Word2 vec.Then a word correlation matrix is constructed to mine the potential topics by sufficient word co-occurrence information.After that,the heterogeneous network is constructed with word,topic and document nodes included.The high-order neighborhood information between word,topic and document nodes is learned through the graph convolution operation.The experimental results on five public short text datasets show that the proposed method improves classification accuracy by 1.56% and improves F-score by 1.74% on average than the benchmark models.Aiming at the insufficient expression of the key word information in the word-topic-text heterogeneous graph structure,the short text classification method that incorporates entity representation into the heterogeneous graph structure is carried out.An entity-topic-text heterogeneous network(ETDHN)is built for merging the semantic dependencies between entities,topics and texts by mapping the key word nodes of the WTDHN to entity representation in the knowledge graph.Then the hierarchical attention mechanism is introduced into the heterogeneous graph convolution so as to capture key-level semantic information in the ETDHN.Finally,a part of noise information is introduced into the heterogeneous network through the Drop-edge-random method,which enhances the robustness of the method.Compared with the short text classification method based on the word-topic-text heterogeneous graph structure,the classification accuracy of this method is increased by 3.83%,and the Fscore is increased by 3.3% on average.
Keywords/Search Tags:Text graph structure, Theme mining, Entity representation, Heterogeneous network, Hierarchical attention mechanism, Short text classification
PDF Full Text Request
Related items