Research On Short Text Clustering Algorithms Based On ELMo

Posted on:2021-07-18

Degree:Master

Type:Thesis

Country:China

Candidate:S C Zhao

Full Text:PDF

GTID:2518306047982079

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the development of science and technology and the popularization of intelligent mobile terminals,the Internet industry has come to a golden age of rapid development.The massive short text information that people produce everyday contains huge value waiting to be mined.Text clustering for massive short text information is not only the premise for people to grasp the whole text information,but also the premise for people to analyze and predict the text.Therefore,better clustering for short text has become a hot research direction of scholars.At present,the traditional word2 vec model can express text conveniently,but the word vector trained by this model belongs to static word vector,which can not solve the problem of polysemy of the words in short text.The model of ELMo can show that the semantic relationship between words and context.This method can solve the problem of polysemy of the word without making the model dimension too high.This paper proposes a short text clustering algorithm based on Elmo language model.The main work of this paper is as follows:,Firstly,this paper introduces the importance of short text clustering in natural language processing in view of the current development of short text clustering,as well as the problems and solutions in the process of short text clustering.At the same time,some text clustering algorithms and distance calculation functions are introduced.Secondly,in order to the polysemy problem that traditional word vectors can’t distinguish words,this paper proposes to use ELMo model to train dynamic word vectors.Then the model uses GRU network to reduce the training time of word vectors.Finally integrating attention mechanism to strengthen the relationship between words and context,can improve the accuracy of word vector expression on words.Thirdly,the initial clustering centers of traditional K-means clustering algorithm will influence on the clustering results.This paper presents an optimized K-means clustering algorithm.In this algorithm,RWMD distance is selected as the text distance,and the number of clusters is determined by LDA model to determine the number of main topics in the text set.Then several points farthest from each other are selected as the initial center of clustering.At last,the experimental results show that the accuracy of the proposed clustering algorithm is improved compared with the traditional clustering algorithm,and the convergence speed is relatively fast.

Keywords/Search Tags:

clustering, word vector, ELMo, RWMD distance, attention mechanism

PDF Full Text Request

Related items

1	Research On Chinese Text Sentiment Analysis Algorithm Based On ELMo And Bi-SAN
2	Research And Application Of Chinese Short Text Clustering Algorithm Based On Word2Vec
3	Research And Application Of News Text Classification Based On Deep Learning
4	Research And Application Of Short Text Clustering Based On Word Representations
5	Research On Sequence Labeling Model Of Natural Language Processing Based On Deep Learning
6	Research On Text Generative Summarization Method Based On Attention Mechanism
7	Research On Text Sentiment Analysis Based On Deep Learning And CTM Model
8	Image Semantic Understanding Introducing Word Embedding And Attention Augmentation Mechanisms
9	Research On Named Entity Recognition For Science And Technology Terms Based On Dependent Entity Word Vector
10	Semantic Similarity Measurement Of Short Text By Convolutional Neural Network Based On Multi-Dimensional Attention On Word Vector