Tens of thousands of genes of organisms living cells express in some stable modes to maintain normal status, but when these cells get into disease status, the expression patterns of genes will be altered in response. This phenomenon can be referred to as gene differential expression. Differentially expressed genes are rich in information about abnormal cells. Technology of gene chips is widely used in the field of biomedical research. It can measure tens of thousands of gene expression levels in a cell at a same time, which contain rich gene activity information.Radiotherapy is one of the effective methods of tumor treatment in clinical practice, for example, about 70% of cancer patients need to use radiotherapy during the treatment, and 40% of radiotherapy patients can be cured completely. A large number of studies have found that, the response to the patients of radiotherapy is mainly dependent on the radiation sensitivity of individual patients. The inherent radiation sensitivity to the tumor is closely related with gene expression and gene regulation. So, given sensitive genes to radiation, one can greatly improve radiotherapeutic outcome of tumors by deciding which patients are suitable to radiotherapy or not and applying proper dose of radiation. The gene expression data of tumor are typically characteristic of high dimensionality and small sample size. Mining underlying genetic information by identifying radio-sensitive genes plays an important role to reveal the pathogenesis of tumor and to effectively applying radiotherapy to cancer patients.This dissertation focuses on developing computational methods for identifying radio-sensitivive genes. The main work can be summarized in the following aspects:First, we developed two constrained regression-based methods, nonnegative least squares and elastic net, two network-based methods, KEGGSDRW and PPISDRW, for the identification of radio-sensitive genes. Consider the non-negativity of gene expression, nonnegative least squares solution by nonnegative least square regression is more meaningful than ordinary solution by ordinary least square regression. Elastic Net is a recently developed regression method, which is mathematically subject to lasso and Ridge constraints. KEGGSDRW and PPISDRW are based on a priori network information and are established by combining direct random walk (DRW) and spearman correlation. DRW runs over a network and can assess the importance of each gene in terms of network topology. Second, we applied the above methods to identify radiosensitivity genes based on real NCI-60 gene expression data. In classification evaluation, we showed that the radiosensitivity genes identified by Elastic Net, KEGGSDRW and PPISDRW methods have good classification prediction performance. Third, we conducted pathway enrichment analysis based on hypergeometric distribution. The results showed that the network-based methods had a large advantage in disease-related pathway enrichment, i.e., the significant pathways by the network-based methods are mostly concentrated in the human disease function module. In order to further verify the reliability of the results, we also conducted correlation enrichment analysis on the identified radio-sensitive genes by the above methods. The results showed that gene correlations are significantly enriched among the identified genes by these methods.Finally, we conclude the paper and discuss some work that should be done in the future. |