Font Size: a A A

The Research Of Clustering In Data Mining With Genetic Algorithm

Posted on:2009-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:K LiuFull Text:PDF
GTID:2178330332481843Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data mining is a new technique started in 1990's. It is a key phase in KDD, and one of the most active branches in database development and application.Data mining is a production combined by many subjects and techniques. at the same time, it is a very young and active field. The target of data mining is finding the unknown and valuable information or pattern hidden in database or data warehouse, and resolving the contradiction between lots of data and less knowledge. In the face of a great number of data, the primary task is categorizing them in reason. Otherwise, normally the problem is not having no patterns, but superabundance of patterns. At this time, clustering is an reasonable method to categorize. Clustering categorizes data into some clusters according to a certain rules. These clusters are not predictive, but decided by the character of data. In a certain cluster, the objects are similar with each other, and dissimilar with the data in other clusters.By clustering, people can distinguish the dense and sparse field, and find the relationship between distribution pattern and data attribution. In data mining, clustering is a useful tool to get the condition of data distribution, and discover the character of every cluster. In this way, people can pay more attention to analysis a certain cluster. At the same time, clustering can be a pretreatment phase of other algorithms that worked on these clusters. Heuristic clustering algorithm is one of the most popular method, but the searching efficiency of nowaday heuristic clustering algorithms is bad.In this paper, nowaday heuristic clustering algorithms are studied through emulators, and a solution to improve clustering efficiency is introduced.The central studying content is as follows:1. Studying some heuristic clustering algorithms in efficiency, such as K-MEANS, PAM, CLARA, and analyzing their efficiency and their clustering result.2. Studying and analyzing the CLARANS algorithm, and analyzing the defect of CLARANS, finally using emulator to validate.3. Based on inheriting the excellence of CLARANS, using GA to improve CLARANS' efficiency, and introducing NGA-CLARANS algorithm using GA based on niche to improve the diversity of colony to avoid prematurity. NGA-CLARANS has better global astringency.
Keywords/Search Tags:Data Mining, Clustering, Genetic algorithm, Niche, CLARANS
PDF Full Text Request
Related items