| The microarray technologies provide a tool for study large scale gene expression relationship. One of the fundamental principles of biological organization is modularity, i.e. genes can be modularized according to their expression profiles. In general, clustering algorithms are used to group gene expression profiles, and then extract useful patterns. In first part of this dissertation, we proposed a new hierarchical clustering algorithm, called dynamically growing self-organizing tree (DGSOT) algorithm, which overcomes drawbacks of traditional hierarchical clustering algorithms. The DGSOT algorithm combines the horizontal growth and vertical growth to construct a mutlifurcating hierarchical tree from top to bottom to cluster the data. In addition, we propose a new cluster validation criterion, called Cluster Separation, for finding the proper number of clusters at each hierarchical level. And a K-level up distribution (KLD) mechanism, which increases the scope of data distribution in the hierarchy construction, was proposed to improve the clustering accuracy. The clustering result of the DGSOT can be easily displayed as a dendrogram for visualization. Based on a yeast cell cycle microarray expression profiles, we found that the hierarchical structure of the DGSOT clustering results is more reasonable than that of Self Organizing Tree Algorithm (SOTA) results. Furthermore, the biological functionality enrichment in the clusters is considerably higher.; However, the clustering algorithms need to artificially predefine a threshold to obtain quality clusters. In second part of this dissertation, we proposed a new algorithm based on random matrix theory, called random matrix modeling (RMM), to automatically reveal gene coexpression modules from microarray expression profiles. The similarity threshold obtained by the RMM is from the inherent characteristic of the input dataset. We evaluated the RMM by an in silico modular network model and demonstrated it on a yeast cell cycle microarray expression profiles. The statistical analyses show that the obtained modules are of biological origin and stable to noise. Furthermore, the structure properties of the modules have been proved to follow the common properties of typical biological systems. To the best of our knowledge, the RMM is the first algorithm that presents an objective mathematical criterion to decide the best threshold to reveal gene coexpression modules. And it has been proved to be a robust, sensitive and validate method to reveal the gene coexpression modules from the microarray profiles. |