Biclustering Algorithm Based On High-dimensional Data Research And Applications

Posted on:2010-09-03

Degree:Master

Type:Thesis

Country:China

Candidate:C Zhou

Full Text:PDF

GTID:2208360275998525

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In recent years, with the development in the bioinformatics and e-commerce, more and more high-dimensional data need to be analyzed , data mining technology can find valuable information in these data for scientific research and marketing.In the clustering technology, traditional clustering methods cluster either rows or columns but not simultaneously, so the result is almost about the overall information, but the high dimensional data always contains a lot of local information, which the traditional clustering method can not find. Biclustering algorithm clusters the data matrix in both of rows and columns at the same time ,so they can find the local information in the data matrix, this new method is applied to improve the effect of clustering for the high dimensional data, especially in clustering partial correlation of high dimensional data space.Biclustering algorithm improves the effect of clustering for the high dimensional data, but the research on biclustering algorithm is still in its initial stage, the various biclustering algorithms still have its own shortcomings, and therefore the research of biclustering algorithms is particularly necessary. The main work of this paper is that: firstly,it introduces the definition of the bicluster, type, and structure ,and then analyzes the mathematical model and research strategy of several important biclustering algorithms,concludes their advantages and disadvantages.Based on the research and analysis of biclustering algorithms,in this paper, a Penalty strategy based Overlapping Biclustering Algorithm(POBA) is proposed.The algorithm focuses on improving the iteration in the process of Cheng and Church algorithm, which has to use the random number to replace the biclustering results, the penalty strategy can help the algorithm complete the biclustering, while avoiding the interference with the random number in the greedy search strategy. And POBA algorithm uses a parameter to control the penalty and the overlapping rate of bicluster results, it makes up for the Cheng and Church algorithm setbacks, and help the algorithm satisfy the different biclustering demands. Finaly, the paper implements the algorithm,uses it to bicluster the public high-dimensional data sets. Through analyzing the experimental results, the paper proves the effectiveness of the algorithm, and gives some advice about setting the parameters of the algorithm.

Keywords/Search Tags:

clustering analysis, bicluster, high-dimensional data, penalty strategy, biclustering algorithm

PDF Full Text Request

Related items

1	Analysis Of Gene Expression Data Clustering Algorithm
2	Research On Biclustering Algorithms For Gene Expression Data
3	The Design And Implementation Of Bicluster Data Analyzing Software
4	Research And Application Of Rough Clustering Algorithm For High Dimensional Data Sets
5	Research On High Dimensional Data Clustering Based On Improved Evolutionary Algorithm
6	Evolutionary Computation Based Maximum Similarity Biclustering And Application
7	The Research Of Genetic Algorithms For Biclustering On Gene Expression Data
8	Research On Clustering Algorithms For High-Dimensional Data
9	Research On Subspace Clustering Algorithm For High Dimensional Data
10	Bicluster Analysis Of Heterogeneous Panel Data Via M-Estimation