| In recent years, with the development in the bioinformatics and e-commerce, more and more high-dimensional data need to be analyzed , data mining technology can find valuable information in these data for scientific research and marketing.In the clustering technology, traditional clustering methods cluster either rows or columns but not simultaneously, so the result is almost about the overall information, but the high dimensional data always contains a lot of local information, which the traditional clustering method can not find. Biclustering algorithm clusters the data matrix in both of rows and columns at the same time ,so they can find the local information in the data matrix, this new method is applied to improve the effect of clustering for the high dimensional data, especially in clustering partial correlation of high dimensional data space.Biclustering algorithm improves the effect of clustering for the high dimensional data, but the research on biclustering algorithm is still in its initial stage, the various biclustering algorithms still have its own shortcomings, and therefore the research of biclustering algorithms is particularly necessary. The main work of this paper is that: firstly,it introduces the definition of the bicluster, type, and structure ,and then analyzes the mathematical model and research strategy of several important biclustering algorithms,concludes their advantages and disadvantages.Based on the research and analysis of biclustering algorithms,in this paper, a Penalty strategy based Overlapping Biclustering Algorithm(POBA) is proposed.The algorithm focuses on improving the iteration in the process of Cheng and Church algorithm, which has to use the random number to replace the biclustering results, the penalty strategy can help the algorithm complete the biclustering, while avoiding the interference with the random number in the greedy search strategy. And POBA algorithm uses a parameter to control the penalty and the overlapping rate of bicluster results, it makes up for the Cheng and Church algorithm setbacks, and help the algorithm satisfy the different biclustering demands. Finaly, the paper implements the algorithm,uses it to bicluster the public high-dimensional data sets. Through analyzing the experimental results, the paper proves the effectiveness of the algorithm, and gives some advice about setting the parameters of the algorithm. |