| The age of Big Data, the value of data becomes more and more attrac-tive. Data Mining, as an important means,is widely used. Finding latent value from data is also becoming the significant productivity in all trades and profes-sions by data mining. In order to discover the regular pattern and high-value customers under the data and give different service, this paper partitions the customers of IM-softwares using cluster’s algorithm of Data Mining accord-ing to the visited times and used flow of these software.As the middleman linking the understandings of business and data and building model, designing algorithm, data preprocessing is the important phase of data mining. Data preprocessing will directly influence the result of cluster-ing. Better result can not be obtained, if adequate understanding and analysis and processing for original data is not achieved before data mining. So as to combine the requirement and algorithm perfectly, the paper gives the data preprocessing for original data grounding by the understanding of business and data and get the final data for clustering. The paper introduces the data preprocessing in detail.The K-means and bicluster, which is on the basis of Large Average Subma-trices(LAS), are two cluster algorithms in this paper.Firstly, the paper chooses the traditional K-means clustering algorithm for fractionizing the data accord-ing to the feature of data, and displays and interprets the result of clustering. When fractionizing the date using biclustering algorithm, paper modifies the algorithm and the score function S(·) grounding by the model of Large Av- erage Submatrices (LAS), which is proposed by Shabalin in 2009, and feature of the data.After having been modified, the algorithm and score function are nice to the data. The biclusters of biclustering can interpret the demand of business. The improvement to the algorithm reduces largely the complexity of it. The improvement to the score function not only makes it fit to the data and reduces the complexity of the algorithm, but also the important point is base on that we can choose the difficult parameter by the feature of difficult data set, which make the total algorithm more smart. |