Font Size: a A A

Research On Short Text Clustering Algorithms Based On ELMo

Posted on:2021-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:S C ZhaoFull Text:PDF
GTID:2518306047982079Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of science and technology and the popularization of intelligent mobile terminals,the Internet industry has come to a golden age of rapid development.The massive short text information that people produce everyday contains huge value waiting to be mined.Text clustering for massive short text information is not only the premise for people to grasp the whole text information,but also the premise for people to analyze and predict the text.Therefore,better clustering for short text has become a hot research direction of scholars.At present,the traditional word2 vec model can express text conveniently,but the word vector trained by this model belongs to static word vector,which can not solve the problem of polysemy of the words in short text.The model of ELMo can show that the semantic relationship between words and context.This method can solve the problem of polysemy of the word without making the model dimension too high.This paper proposes a short text clustering algorithm based on Elmo language model.The main work of this paper is as follows:,Firstly,this paper introduces the importance of short text clustering in natural language processing in view of the current development of short text clustering,as well as the problems and solutions in the process of short text clustering.At the same time,some text clustering algorithms and distance calculation functions are introduced.Secondly,in order to the polysemy problem that traditional word vectors can’t distinguish words,this paper proposes to use ELMo model to train dynamic word vectors.Then the model uses GRU network to reduce the training time of word vectors.Finally integrating attention mechanism to strengthen the relationship between words and context,can improve the accuracy of word vector expression on words.Thirdly,the initial clustering centers of traditional K-means clustering algorithm will influence on the clustering results.This paper presents an optimized K-means clustering algorithm.In this algorithm,RWMD distance is selected as the text distance,and the number of clusters is determined by LDA model to determine the number of main topics in the text set.Then several points farthest from each other are selected as the initial center of clustering.At last,the experimental results show that the accuracy of the proposed clustering algorithm is improved compared with the traditional clustering algorithm,and the convergence speed is relatively fast.
Keywords/Search Tags:clustering, word vector, ELMo, RWMD distance, attention mechanism
PDF Full Text Request
Related items