Font Size: a A A

Research On Application Of Data Mining In Prediction Of Cancer Biomarkers And In Analysis Of Cancer Initiation Mechanism

Posted on:2016-04-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:F YaoFull Text:PDF
GTID:1224330482954739Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Cancer is a very complex disease and is one of the major diseases that threaten human health. Grading and staging are two characters of cancer that can be used to measure cancer severity. Cancer stages are used to reflect the size of a tumor and its extent of invasion. It has been traditionally determined by cancer pathologists based on tumor size, nodal spread and metastasis. A widely used system for cancer staging is that the cancer tissues are classed into four stages, namely I, II, III and IV, with a higher stage representing a more advanced cancer. Cancer grading is a measure of the malignancy and aggressiveness independent of stage. Unlike staging, cancer grading has been predominantly done through visual inspection of the cell morphology and tissue structure. Currently there is no universal grading system for all cancer types, but the grading systems generally classify cancer tissues to four grades: well differentiated, moderately differentiated, poorly differentiated and undifferentiated.Breast cancer is a major threat to women’s health, accounting for 22.9% of cancer cases in women. According to the World Cancer Report, 458,503 cases of breast cancer–associated deaths worldwide were reported in 2008, which represents 13.7% of cancer-related deaths in women. It has been generally understood that breast cancer, probably other cancer types as well, of different stages and different grades require different treatment plans. For example, breast-conserving surgery plus radiation therapy is effective for most patients with early stage breast cancers while systemic therapy are generally needed for advanced stage patients, such as hormone or chemo therapy, in addition to cancer-removal surgery and radiation. In addition, cancer grades are strongly associated with prognosis. Specifically, more differentiated cancer grades tend to have more favorable prognosis. Clearly, correct classification of the grade and stage of a cancer has significant implications in determination of the treatment plan for a patient.Although cancer research has made a number of breakthroughs in recent years, such as identified many cancer-related markers, the question on the mechanism of cancer initiation has not been answered. Lots of research data shows that inflammation is a critical component of cancer initiation and progression. Many cancers arise from sites of infection, chronic irritation and inflammation. It is now becoming clear that the cancer microenvironment, which is largely orchestrated by inflammatory cells, is an indispensable participant in the neoplastic process, fostering proliferation, survival and migration. In addition, cancer cells have co-opted some of the signaling molecules of the innate immune system, such as selectins, chemokines and their receptors for invasion, migration and metastasis. It is now clear that proliferation of cells alone does not cause cancer. Sustained cell proliferation in an environment rich in inflammatory cells, growth factors, activated stroma, and DNA-damagepromoting agents, certainly potentiates and promotes neoplastic risk. The current theories attribute inflammation-induced cancer to specific tissue-repair signals, such as those for cell proliferation, survival, angiogenesis, pro-inflammation, and genomic instability induced by over-production of reactive oxygen species(ROS), mostly hydroxyl peroxides(H2O2) and superoxides(O2-) by innate immune cells. Today, the causal relationship between inflammation, innate immunity and cancer is more widely accepted; however, many of the molecular and cellular mechanisms mediating this relationship remain unresolved. It needs for further study.With the development of RNA-Seq sequencing and microarray technology, there are a large number of gene expression data emerged in front of us, which is an opportunity and a challenge for bioinformatics researchers. Based on these gene expression data, we applied data mining methods to predict breast cancer related gene signatures and protein markers, meanwhile construct and analysis the mechanism model of cancer initiation from chronic inflammation. The main contents of this research are as follows:1. Identification of gene signatures and protein markers for breast cancer and its grades and stagesWe present a computational method for prediction of gene signatures and blood/urine protein markers for breast cancer and its grades and stages based on RNA-Seq data, which are retrieved from TCGA breast cancer dataset and cover 111 pairs of disease and matching adjacent noncancerous tissues with pathologists-assigned stages and grades. By applying a differential expression and an SVM-based classification approach, we found that 324 and 227 genes in cancer have their expression levels consistently up-regulated vs. their matching controls in a grade- and stage-dependent manner, respectively. By using these genes, we predicted a 9-gene panel as a gene signature for distinguishing poorly differentiated from moderately and well differentiated breast cancers, with 96.3% classification accuracy, 94.5% sensitivity and 97.3% specificity, and a 19-gene panel as a gene signature for discriminating between the moderately and well differentiated breast cancers with 94.2% classification accuracy, 95.0% sensitivity and 92.2% specificity. Similarly, a 30-gene panel and a 21-gene panel are predicted as gene signatures for distinguishing advanced stage(stages III-IV) from early stage(stages I-II) cancer samples and for distinguishing stage II from stage I samples, respectively. Their classification accuracy(sensitivity, specificity) are 99.9%(99.5%, 100%) and 98.0%(99.7%, 91.3%), respectively. We expect these gene panels can be used as geneexpression signatures for cancer grade and stage classification. In addition, of the 324 gradedependent genes, 188 and 66 encode proteins that are predicted to be blood-secretory and urine-excretory, respectively; and of the 227 stage-dependent genes, 123 and 51 encode proteins predicted to be blood-secretory and urine-excretory, respectively. We anticipate that some combinations of these blood and urine proteins could serve as markers for monitoring breast cancer at specific grades and stages through blood and urine tests.2. Analysis of mechanism of cancer initiation from chronic inflammationBased on microarray data, we used computational method to analysis the mechanism of cancer initiation from chronic inflammation. From literature search, we collected 18 chronic inflammation diseases. We divided these chronic inflammation diseases into two groups, i.e. cancer prone inflammation(CPD) and cancer independent inflammation(CID), based on her their cancer risk ratio. We further divided the cancer prone inflammation diseases into highly cancer prone inflammation(HCPD) and moderately cancer prone inflammation(MCPD). We downloaded 23 gene expression datasets covering 18 kinds of chronic inflammation diseases from GEO database to do research as follows. First,we identified differentially expressed genes and analyzed their pathway enrichment. Then we found that 75% dysregulated pathways in CPDs are inflammatory, tissue repair, immune response and oxidation reduction reaction related. Second, based on 226 gene expression dataset covering 11 immunity and tissue repair-associated cells, we developed principle component(PC)-regression based cell type deconvolution method. Using this method on our chronic inflammation data, we identified that increased CD4 T cell and macrophage are consistently observed in all the HCPDs, while decreased CD8 T cell and monocyte populations are also consistently observed most HCPDs. Third, based on 10 hypoxia treatment gene expression datasets, we developed a predictor to predict hypoxia level of a tissue. Using this predictor to our inflammation data, we found the cancer associated diseases are general be more hypoxic comparing to cancer independent diseases. Fourth, by applying predictor of oxidative stress level on our inflammation data, we found that increased oxidative stress in CPDs is significantly higher than that in CIDs. Analysis of iron metabolism and mitochondrial function associated with oxidative stress, we found distinct patterns of the dysregulated iron ion metabolism genes in CPDs versus CIDs, and consistently more downregulated mitochondrion genes in CPDs versus CIDs. Fifth, by analyzing the function of glycosaminoglycan and other components of extracellular matrix on CPDs, we found that they may serve as an important material for balancing various reactive oxygen species. Finally, according to above analysis results, we have analyzed and constructed the mechanism model of cancer initiation from chronic inflammation based on correlation analysis method.
Keywords/Search Tags:Data Mining, Breast Cancer, Gene Biomarker, Protein Biomarker, Chronic Inflammation, Deconvolution, Reactive Oxygen Species, Hypoxia
PDF Full Text Request
Related items