Font Size: a A A

Water Conservancy Text Classification Model Based On Lstm And K-means Clustering

Posted on:2022-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y C ZhangFull Text:PDF
GTID:2492306539473804Subject:Agricultural engineering and information technology
Abstract/Summary:PDF Full Text Request
Along with the development of the Internet and artificial intelligence technology,paper books,documents,certificate and other text produced a large number of electronic text information and then gradually replaced by the electronic text.How to search and select the consistent and valid information from the electronic format of text information has very important significance.The target of text processing is to optimize and improve the management process of the text and is convenient for the user to obtain information which is meeting the needs of users from the text.When it comes to practical application,text processing can be summarized as text representation,text classification,text clustering and so on.Text representation refers to the conversion of text into an array or number vector that represents a specific meaning for the machine to understand.Text categorization means that the desired categories have been determined in advance and then classified into different categories based on the content of the text.Text clustering is to aggregate documents into document clusters based on the characteristics in the documents.In this paper,a water conservancy text processing model based on LSTM(Long Short-Term Memory)and K-means clustering is proposed to study the text processing of water conservancy news.Specific research contents and results are as follows:1.In Chinese word segmentation,because of the particularity of Chinese text and the direction of the field studied in this paper,this paper chooses Jieba word segmentation technology which is based on Python language.In order to achieve better word segmentation effect,Jieba word segmentation dictionary is simply expanded on the basis of combining related professional words in the field of water conservancy news.2.In the text representation,the skip-gram model in Word2 vec is used for vectorization processing of the text data after word segmentation,and the output result is the word vector form of each word in the text data.After that,this paper optimize the results of text representation.Finally,the output results of the model are vertically stacked to represent the characteristics of each word in the text data in the format of a two-dimensional matrix,and then input into the K-means clustering model.3.This paper attempts to introduce deep learning related theories and constructs a model combining LSTM and K-means algorithm to process water conservancy news text.This process avoids the neglect of the relationship between words in the traditional text processing method,and the training is easy to fall into the defect of local optimal.Finally,the accuracy rate,recall rate and F1 value are used to evaluate the text processing results,and the results prove that the combination of LSTM model and K-means algorithm can achieve a better text processing effect.
Keywords/Search Tags:Text representation, Text classification, Text clustering, Long Short-Term Memory, K-means
PDF Full Text Request
Related items