Font Size: a A A

An Improved Bicluster Algorithm And Its Application

Posted on:2012-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhaoFull Text:PDF
GTID:2178330335950041Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Gene expression data are being generated by DNA and other microarray techniques and they are often presented as matrices of expression levels of genes under different conditions. One of the usual goals in expression data analysis is to group genes according to their expression under multiple conditions, or to group contions based on the expression of a number of genes. As a common means of disposing of the data set, cluster analysis was the most popular tool for identification of groups of co-expressed genes under all experimental conditions (measurements) present on the input dataset.Many clustering algorithms have been proposed in this regard. However, common disadvantage of all these clustering algorithms is that they try to find group of genes that remain co-expressed through all experimental conditions (measurements). But in reality genes tends to be co-regulated and thus co-expressed under only a few experimental conditions. They may start behaving differently under different conditions. If an input dataset has many measurements and an algorithm tries to find out group of genes expressed similarly under all measurements, then chances of finding such a group with success is very less. To overcome this problem, the concept of biclustering has emerged.To overcome these drawbacks, the Biclustering algorithms have been emerged. Yizong Cheng and Geoge M. Church introduced the biclustering algorithm to the gene expression data analysis and processing for the first time in 2000, they proposed aδ-bicluster model is called the Cheng-Church algorithm. It overcomes the limitations of the traditional clustering algorithms which can only find global information in the expression data matrix, to allow for simultaneous clustering of rows and columns, so that one can mine the local information hidden in the gene expression data matrix, which usually often has a very important biological significance.Cheng-Church algorithm is essentially a greedy algorithm, using the average value of row, column and block to define an evaluation function (mean square residue) to score a row or a column on a scale that the bank or fluctuations in the column of data consistency, according to predetermined scoring threshold to decide whether to add or remove the row or the column. It has two important steps:operation of node deletion and operation of node addition. The purpose of the node deletion is to reach a small score value, which means fluctuations in the same cluster of data blocks, the node addition operation is intended to be the size of the cluster block as large as possible. In order to improve the efficiency of the node deletion, a multiple node deletion algorithm is proposed on the basis of a single node deletion algorithm.Cheng-Church algorithm is first proposed and the most classic biclustering algorithm. However, it is also because the earliest, so the Cheng-Church algorithm is not mature, of which there are many fly in the ointment, such as clustering to find more than one blocks, it uses random numbers to cover the matrix element found in the prior block. In one hand, the uncertainty of random number caused much damage to the authenticity and credibility of the original expression matrix data. On the other hand, result from the previously found cluster blocks are concealed, the ability of Cheng-Church algorithm to find overlap between blocks is greatly reduced. These imperfections are looking forward to improve it later, the theme of this paper is just an improvement of the this classical algorithm.This paper begins talking about the traditional clustering algorithm, leads to the meaning of bicluster algorithm, then introduce the definition, mathematical model, type, structure of bicluster and the existing algorithm and its classification. After detailedly addressing the classic Cheng-Church algorithm, the focus for its shortcomings has been improved step by step, then the time complexity of the improved algorithm and its advantage compared with the Cheng-Church algorithm will be analyzed, several important parameters in the improved algorithm are discussed. Finally, experiments to verify by comparison the effectiveness of this improvement, and to apply it to two actual projects.
Keywords/Search Tags:Bicluster, Cheng-Church Algorithm, DNA Chip, Bioinformatics
PDF Full Text Request
Related items