| Genetic Programming (GP) is an Evolutionary Algorithm. Programming trees, which are solutions of a problem, are also descriptions of the problem. As long as the "function" and "terminator" are provided to describe the problem, GP can automatically combines them into hierarchy program trees. It is a breakthrough in the expression of a problem description using the structure of the tree, which makes the application field of GP spreads broader. The way of GP solving the problem fits one of the important goals of computer science, which is how to make the computer programs automatically design themselves to solve the problem. Therefore, the study of GP becomes a hot area of intelligent.With the human genome research is focusing to functional genomics, it is becoming a hot and one of the key points to analysis of gene expression data for Bioinformatics. Gene function and gene expression information can be obtained by analysis of gene expression data which is important for clinical diagnosis in medicine, determining drug efficacy, revealing the mechanism of disease and so on. However, as a result of gene expression data of its own particularity, such as large scale and high-dimensional data set, noise and the lack of priori knowledge and so on, cluster analysis becomes a major research method to study gene function and gene regulation information. But the existing methods have disadvantages at some degree.The main work and results of this paper include:1) A systemic summary is studied on GP algorithm. A new model named HS-model is proposed to be used for statistic and analysis of the program subtrees to improve the efficiency of GP. Then the effective of HS-model is proved by solving the artificial ant problem.2) A GP clustering system is proposed to deal with large and high-dimensional data set. HS-model in this system is used for statistic and analysis of the data set to provide information for GP clustering. An appropriate fitness function is also proposed. This system can eliminate the infection of data scale and dimension in a large part.3) Based on research of the clustering on gene data, the proposed GP clustering system is used to cluster the yeast gene data. The clustering system can effectively deal with gene expression missing data on the impact of clustering performance. And through a comparative analysis with the result from biologists using hierarchical clustering and knowledge in the field, the proposed system is proved that it is able to automatically produce an effective cluster result spending economy of time and space. |