Font Size: a A A

Research And Implementation Of K-means++ Algorithm Improvement And Search Application Based On Latent Semantics

Posted on:2020-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiuFull Text:PDF
GTID:2428330590995665Subject:Logistics engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and information technology,the amount of information on the network is also growing rapidly.Faced with the increasing amount of information,how to quickly and accurately extract the key information and related information of users' search in a large amount of information,improve search efficiency and search accuracy,has become a hot research topic in recent years.However,the traditional search system generally performs content matching based on keywords,and it is relatively difficult to perform related latent semantic search according to the user's search requirements.In addition,K-means algorithm and its derivative algorithm K-means++ algorithm are commonly used to cluster large-scale data due to its simple implementation and fast convergence.However,due to the problem that the traditional K-means algorithm and its derivative algorithm K-means++ are selected due to the randomness of initial cluster center selection and the number of clusters K,it may lead to instability of clustering results.Therefore,this paper proposes an improvement of the K-means++ algorithm based on latent semantic analysis and applies it to the search system.To this end,this paper mainly made the following research:First of all,this paper conducts research and construction of latent semantic models.This paper studies the basic principles of latent semantic analysis techniques,using text preprocessing,text segmentation,synonym merging,constructing word-document matrix and performing matrix decomposition and dimensionality reduction.Finally,the semantic similarity between documents is calculated to construct a latent semantic model.Thus understanding and processing the user's search request from a semantic level and build a semantic link between the data.Secondly,this paper conducts research on K-means++ algorithm based on improved latent semantic analysis.This paper proposes an improved algorithm based on K-means++ for further cluster analysis of potential semantically filtered data sets.This paper optimizes the selection of the initial cluster center.Data preprocessing is performed on the data sets to be clustered,and noise points are optimized to further optimize the selection of the initial cluster center.After data preprocessing,the paper further optimizes the K-means++ algorithm based on density,and calculates the cluster center by calculating the centroid of each element in the cluster and the change of the cluster center by each iteration of the loop.,thereby increasing clustering efficiency and reducing the time complexity of the clustering algorithm.Experimental verification was carried out through UCI's data sets commonly used for machine learning.The experimental results verify that the improved algorithm of this paper is improved compared with K-means++ algorithm in terms of clustering accuracy and clustering efficiency.Finally,this paper designs and implements a K-means++ search system based on latent semantic analysis.Combined with the potential semantic analysis model proposed above,combined with the improved K-means++ algorithm for clustering,according to the user's search content,the relevant results and potential semantic search results are displayed quickly and demonstrated on the system platform.
Keywords/Search Tags:Text Clustering, Latent Semantic Analysis, K-means Algorithm, K-means++ Algorithm
PDF Full Text Request
Related items