Water Conservancy Text Classification Model Based On Lstm And K-means Clustering

Posted on:2022-05-10

Degree:Master

Type:Thesis

Country:China

Candidate:Y C Zhang

Full Text:PDF

GTID:2492306539473804

Subject:Agricultural engineering and information technology

Abstract/Summary:

PDF Full Text Request

Along with the development of the Internet and artificial intelligence technology,paper books,documents,certificate and other text produced a large number of electronic text information and then gradually replaced by the electronic text.How to search and select the consistent and valid information from the electronic format of text information has very important significance.The target of text processing is to optimize and improve the management process of the text and is convenient for the user to obtain information which is meeting the needs of users from the text.When it comes to practical application,text processing can be summarized as text representation,text classification,text clustering and so on.Text representation refers to the conversion of text into an array or number vector that represents a specific meaning for the machine to understand.Text categorization means that the desired categories have been determined in advance and then classified into different categories based on the content of the text.Text clustering is to aggregate documents into document clusters based on the characteristics in the documents.In this paper,a water conservancy text processing model based on LSTM(Long Short-Term Memory)and K-means clustering is proposed to study the text processing of water conservancy news.Specific research contents and results are as follows:1.In Chinese word segmentation,because of the particularity of Chinese text and the direction of the field studied in this paper,this paper chooses Jieba word segmentation technology which is based on Python language.In order to achieve better word segmentation effect,Jieba word segmentation dictionary is simply expanded on the basis of combining related professional words in the field of water conservancy news.2.In the text representation,the skip-gram model in Word2 vec is used for vectorization processing of the text data after word segmentation,and the output result is the word vector form of each word in the text data.After that,this paper optimize the results of text representation.Finally,the output results of the model are vertically stacked to represent the characteristics of each word in the text data in the format of a two-dimensional matrix,and then input into the K-means clustering model.3.This paper attempts to introduce deep learning related theories and constructs a model combining LSTM and K-means algorithm to process water conservancy news text.This process avoids the neglect of the relationship between words in the traditional text processing method,and the training is easy to fall into the defect of local optimal.Finally,the accuracy rate,recall rate and F1 value are used to evaluate the text processing results,and the results prove that the combination of LSTM model and K-means algorithm can achieve a better text processing effect.

Keywords/Search Tags:

Text representation, Text classification, Text clustering, Long Short-Term Memory, K-means

PDF Full Text Request

Related items

1	Research On Analysis And Mining Method Of Railway System Fault Text Data Based On Machine Learning
2	Research And Implementation Of Utility Pole Nameplate Recognition System In Complex Environment
3	The Research And Application Of Short Text Classification Based On Machine Learning In Nuclear Power Quality Management
4	Multi-label Classification Of Power Defect Text Based On Deep Learning And Normative Evaluation
5	Text Recognition Of Relay Protection Equipment Image Based On Deep Learning
6	Text Recognition In Images Of Traffic Scenes
7	Research On Text Multi-tag Classification Model Of Rail Transit Equipment Failure
8	Research On Intelligent Recognition And Application Of Alarm Information In Power Grid Based On Text Mining
9	Design And Implentation Of Power Grid Device Error Report Management System Based On Text Analysis
10	Research On Street Sign Text Recognition Method Based On Deep Learning