| The great challenge in cancer treatment is how to direct specific treatment to particular tumour in order to achieve the better therapy effect while the lower toxicity. So the cancer detection or cancer classification becomes one key point for cancer therapy. For a long time, the classification lies on the sample morphology, which is not efficient in many cases. Because tumours in different stages may present similar pathomorphism and tumours with similar pathomorphism may react differently to various therapies. Now cancer detection using gene expression data is an important aspect in cancer research. Recently, with the development of Microarray technology massive of gene expression data is produced, which is help for exploring complicated genetic regulating network, investigating functional genome and studying on cancer detection. However, there are characters in gene expression data, such as high dimensionality, huge noise, huge redundancy and nonequilib-rium distribution, which imposes challenges for development of the associated data mining techniques and cancer detection.In this dissertation, we emphasize on analysis of gene expression data. Our major goals are for gene expression mode mining and cancer detection. We explore the gene expression data pre-processing, the feature gene selection, analysis of gene expression model to cancer and building the cancer detection model. The main contributions of this dissertation are summarized as below:Firstly, the characters of gene expression profile are analyzed and a CMST clustering based multi-step gene selection scheme is proposed, then "Gap Statistic" is introduced into this feature gene selection to determine the number of feature genes, so we develop a self-adaptive gene selection method, which makes a great improvement compared to the mechanism of setting the number of feature genes arbitrarily.Secondly, PCA and ICA is applied to analyze the gene expression data and investigate the underlying regulating factor and gene regulating networking in cancer. Sampling is used to produce the gene subsets, and in the PCAP and ICAP of subsets the noninformative features are reduced, then the gene expression modes are reconstructed and a hidden gene expression model based cancer detection is presented.Thirdly, the biological locality of gene expression to the cancer is explored, and a concept of relative space to a cancer is proposed, then the cancerogenic gene mode based on relative space is extracted, and the regulation with cancerogenic gene mode is discussed. Then a cancer detection algorithm with relative gene expression mode is proposed, in which the problem of "curse of dimensionality" is relived.Fourthly, when different feature selections are used, as the researching mechanism and evaluation strategy are different the distinct feature genes, which tend to different aspects of cancer, are selected. The classification results using these classifiers with distinct genes varied a lot. So a group of complemental gene classifiers are constructed, and a ensemble cancer classification algorithm is proposed.Fifthly, the gene co-expression and explainable emerging pattern are explored. The virtual samples is added to to improve distinguishment of emerging pattern, and in the strategy of choosing cut point the distribution of cut point is assumed to be the Gaussian distribution for improving the reliability of emerging pattern and two emerging pattern based cancer detections are presented. |