Font Size: a A A

Research On Clustering Agregation And Its Application In Corn Breed Selection

Posted on:2012-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:C Y YangFull Text:PDF
GTID:2143330332499204Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
This paper is based on National High Technology Research and Development Program of China (Digital Agricultural Knowledge Grid Technology Research and Application). We first use clustering aggregation technology to propose two algorithms to analyze the relationship among properties of corn breed, and then combines web service, and finally design and implement a corn breed selection system based on B/S structure.Clustering plays an important role in data analysis. It partitions the data set mainly based on the similarity (or distance) between data patterns, and divides the data set to several disjoint clusters. By cluster analysis, we can mine some useful information from the large data and apply these information to various research fields and practical applications. Although there are a lot of clustering methods, they have some shortcomings so that they can't make a good partition on data set. Clustering aggregation is the enhancement of clustering. It first gets many partitions of the data set by using various clustering algorithms, and regards these partitions as the understanding of the data set, and then calculates the similarity between data patterns and redistributes these patterns, and finally gets a clustering result which is consistent with most of the partitions.This paper studies clustering and clustering aggregation, and proposes two algorithms: c-means clustering algorithm and CAVM clustering aggregation method. c-means is an. improved algorithm based on k-means. It can cluster mixed data set with categorical and numeric attributes. There are three different points between c-means and k-means:(1) the number of clusters and the choice of initial cluster center, mainly depend on the number of categorical attributes; (2) the calculation of the distance between data patterns, is not using the Euclidean distance, but using the form of vector to express the distance between patterns and the cluster center, taking both noun and numeric attributes into account; (3) the calculation of the convergence function, is also in the form of vector, by calculating all the distances between patterns and the cluster center in every cluster to get the sum, and if the sum does not change, you can end clustering process. CAVM is proposed based on the c-means algorithm, which is a integration method based on voting. It can be used for mixed attributes data set to cluster analysis. CAVM algorithm uses voting twice to fix the clustering process, the first to amend cluster members, and the second to amend the final clustering results in the combination. Be different from the general cluster aggregation methods, CAVM algorithm first gets a initial partition by the different values of noun attribute, and then use c-means to cluster data sets, and votes for patterns simultaneously, and finally partitions the data set based on the voting result.We totally collect 122 corn breeds which are suitable for planting in Jilin Province. By observation and analysis of the properties of the breeds data, we find some correlations between attributes. To identify and verify these correlations, we reduce the dimension of the 122 corn breeds'properties information, and make standardization, and then use c-means and CAVM algorithms on the data set, and mine two association rules. They are:"the growth period-breed style" association rule and "planting regions-breed style" association rule. The two association rules are almost consistent with agricultural acknowledge, which shows c-means algorithm and CAVM algorithm can identify the internal structure of the data set. In addition, in order to verify the accuracy of the two algorithms, we compare c-means, CAVM with k-means algorithm by experiment, and the experimental results show that c-means and CAVM have obvious advantages to handle mixed data set.Although the most current agricultural websites have the detailed information of crops and online experts answer pages, which not only provide knowledge to users, but also solve difficult problems. However, these sites have two shortcomings:(1) when selecting breed, the user needs to understand and analyze the properties of breeds and take the local actual situations into account, and makes a choice after a comprehensive comparison, which increases the user's burden; (2) online expert could provide more authoritative answer to the user, but if the expert is not online, or the user's questions are answered after some time, which makes answers lose the timeliness.To change this situation, and there is still not a correlative and complete corn breed selection system, this paper designs and implements a corn breed selection system based on 122 corn breeds. In the process of designing and implementing the system, we used Microsoft Visual Studio 2008, IIS technique and SQL Server 2005, and involved the C# and ASP.NET programming language, Internet-based web services, design and management on database table. It is noteworthy that the system used the two mined association rules to translate the user's input into query constraints, which helps to guide the user to select breed. The final breed selection system has two parts:the primary query and advanced query. The primary query does not need many user's input and mainly for the users with smaller needs. The advanced query has more constraints, and users can input various needs and constraints according to their actual situations. After inputting the needs and constraints, the system gives corn breeds that meet the user's constraints. The system also gives detailed information about breed. In addition to breed name, height, growth period, breed style, planting conditions, planting regions, there are yield, disease resistance, lodging resistance and the planting points which the users concern about. These information not only deepen user's understanding about the breed, but also guide the user to better plant corn.The corn breed selection system we design, compared with the general agricultural websites, is much more convenient and accurate. It does not require the user to analyze the properties and consider a lot of situations such as planting region, the local wind and high yield and so on. The user only needs to input his demands, and the system can quickly give some breeds which meet the user's demands. Certainly, the system has some shortcomings. For example, it is not intelligent enough, and can not handle some special cases. It includes too few reasoning mechanism with only two constraint rules, which can not handle some special breed.
Keywords/Search Tags:Clustering, Clustering aggregation, Constraint query, Breed selection
PDF Full Text Request
Related items