Research And Implementation Of K-means++ Algorithm Improvement And Search Application Based On Latent Semantics

Posted on:2020-02-27

Degree:Master

Type:Thesis

Country:China

Candidate:Z Liu

Full Text:PDF

GTID:2428330590995665

Subject:Logistics engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet and information technology,the amount of information on the network is also growing rapidly.Faced with the increasing amount of information,how to quickly and accurately extract the key information and related information of users' search in a large amount of information,improve search efficiency and search accuracy,has become a hot research topic in recent years.However,the traditional search system generally performs content matching based on keywords,and it is relatively difficult to perform related latent semantic search according to the user's search requirements.In addition,K-means algorithm and its derivative algorithm K-means++ algorithm are commonly used to cluster large-scale data due to its simple implementation and fast convergence.However,due to the problem that the traditional K-means algorithm and its derivative algorithm K-means++ are selected due to the randomness of initial cluster center selection and the number of clusters K,it may lead to instability of clustering results.Therefore,this paper proposes an improvement of the K-means++ algorithm based on latent semantic analysis and applies it to the search system.To this end,this paper mainly made the following research:First of all,this paper conducts research and construction of latent semantic models.This paper studies the basic principles of latent semantic analysis techniques,using text preprocessing,text segmentation,synonym merging,constructing word-document matrix and performing matrix decomposition and dimensionality reduction.Finally,the semantic similarity between documents is calculated to construct a latent semantic model.Thus understanding and processing the user's search request from a semantic level and build a semantic link between the data.Secondly,this paper conducts research on K-means++ algorithm based on improved latent semantic analysis.This paper proposes an improved algorithm based on K-means++ for further cluster analysis of potential semantically filtered data sets.This paper optimizes the selection of the initial cluster center.Data preprocessing is performed on the data sets to be clustered,and noise points are optimized to further optimize the selection of the initial cluster center.After data preprocessing,the paper further optimizes the K-means++ algorithm based on density,and calculates the cluster center by calculating the centroid of each element in the cluster and the change of the cluster center by each iteration of the loop.,thereby increasing clustering efficiency and reducing the time complexity of the clustering algorithm.Experimental verification was carried out through UCI's data sets commonly used for machine learning.The experimental results verify that the improved algorithm of this paper is improved compared with K-means++ algorithm in terms of clustering accuracy and clustering efficiency.Finally,this paper designs and implements a K-means++ search system based on latent semantic analysis.Combined with the potential semantic analysis model proposed above,combined with the improved K-means++ algorithm for clustering,according to the user's search content,the relevant results and potential semantic search results are displayed quickly and demonstrated on the system platform.

Keywords/Search Tags:

Text Clustering, Latent Semantic Analysis, K-means Algorithm, K-means++ Algorithm

PDF Full Text Request

Related items

1	Chinese Text Clustering Based On Latent Semantic And Its Applications
2	Research On Text Clustering Algorithm Based On Latent Semantic Indexing
3	Study Of Chinese Text Clustering On Improved K-means Algorithm
4	Text Clustering Based On K-means Algorithm And Realization
5	The Research And Application Of Text Clustering Based On Improved K-means Algorithm
6	Research And Implementation Of Text Clustering Based On Fuzzy C-Means Clustering Algorithm
7	Based On K-means The Chinese Text Clustering Algorithm
8	Based On The Text Of The K-means Clustering Analysis
9	An Improved K-Means Algorithm And Its Application In Bidding Data Analysis
10	Fuzzy C-means And K-means Clustering Algorithm And Its Parallel