Font Size: a A A

Feature Genes Selection And Gene Sets Enrichment Analysis For Histologic Grading Of Breast Cancer

Posted on:2011-01-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y YeFull Text:PDF
GTID:1114360308469849Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
Breast cancer is the most common female cancer in the world and the leading cause of death by cancer among women. Although the mortality rate is now stabilized or decreasing, breast cancer incidence is still on the rise through all western countries. Even in Asia, the incidence is gradually increasing in recent years. Etiological factors of breast cancer are related with hereditary, hormone, immunity and environmental factors, including factors of physico-chemical biological, as well as life style.Histologic grade of breast cancer has been recognized for a long period of time. The mostly studied and widely used method of breast tumor grading is the Bloom, Richardson grading system, also known as the Nottingham Grading System. The Nottingham Grading System is based on a microscopic evaluation of morphologic and cytologic features of tumor cells, including degree of tubule formation, nuclear pleomorphism, and mitotic count. The sum of these scores stratifies breast tumors into grade 1 (G1; well-differentiated, slow-growing), grade 2 (G2; moderately differentiated), and grade 3 (G3; poorly differentiated, highly proliferative) malignancies. Multiple studies have shown the grade of invasive breast cancer is a powerful indicator of disease recurrence and patient death, independent of lymph node status and tumor size. Untreated patients with G1 disease have a 95% 5-year survival rate, whereas those with G2 and G3 malignancies have survival rates at 5 years of 75% and 50%, respectively. The histologic grade of breast carcinomas has long provided clinically important prognostic information. However, there are insurmountable inconsistencies in histologic grading between institutions and pathologists. With the advent of new unified methods, such as the Elston and Ellis modification of the Bloom and Richardson method, the reproducibility of histologic grading has been investigated and found to range from 60% to 85%.The genome-wide expression patterns of tumors are representation of the biology of the tumors; the diversity in patterns reflects biological diversity. Gene-expression profiling has been used to develop genomic tests that may provide better predictions of clinical outcome than the traditional clinical and pathological standards. It brought new insights into breast cancer biology and prognosis, and showed promise in refining clinical decision making. Feature genes could be obtained from gene expression profiles to predict histologic grade in breast cancer. Some researchers had identified gene-expression signatures, which predicted the outcome with more accurately than conventional prognostic indicators. The signatures were validated in a follow-up study. However, this validation was imperfect as the training and validation cohorts had overlapping patients and external validation using independent data sets was not performed. Furthermore, most genes in gene expression profile were not related to samples discrimination. Such genes will increase the dimension in discrimination and computing complexity. Noise data generated if these genes involved in grouping. A small subset that classified samples corretly is helpful for biomedical researchers to explore fountions of these genes and develop a cheap microarray for cancer diagnosis. Accordingly, feature extraction is essential to get a minor and accurate gene subset in microarray data analysis. The differentiation level (or grade) of human tumors is assessed routinely in the clinic, with poorly differentiated tumors generally having the worst prognoses. However, this classification is based on histopathological criteria, and the underlying molecular pathways controlling tumor differentiation are poorly described. The hallmark traits of stem cells—self-renewal and differentiation capacity—are mirrored by the high froliferative capacity and phenotypic plasticity of tumor cells. Moreover, tumor cells often lack the terminal differentiation traits possessed by their normal counterparts. These parallels have given rise to the hypothesis that tumors often arise from undifferentiated stem or progenitor cells. A number of oncogenes are known to interfere with normal cell differentiation, and such oncogenes could also affect tumor cell differentiation that the regulatory networks controlling the function of stem cells may also be active in certain tumors.We examined whether histologic grade is associated with gene expression profiles of breast cancers and such profiles could be used to improve histologic grading. We used recently developed gene set expression analysis methods GSEA to assess whether the expression signatures and regulatory networks that define human ES cell identity are also active in human tumors.This thesis can be divided into three parts:PartⅠ:Quality control of microarray dataGene expression data from these studies can be accessed at published gene expression datasets, the National Center for Biotechnology Information (NCBI) GEO database (http://www.ncbi.nlm.nih.gov/geo/, accession numbers GSE2109, GSE5460 GSE1456 and GSE3494). Two cohorts of patients included in this study were based on platform GPL570. Data preprocessing and normalization were done with dChip package. Expression values were generated in dChip employing a model-based expression algorithm and the perfect match/mismatch model (PM/MM). We used a two-step filtration strategy in order to remove noise while retaining true biological information. The first step was to get the scaning images of chips and report of array summary. Samples without clinical data and those array outlier and single outlier more than 5% were excluded.676 breast cancer microarray samples were obtained for further analysis.186 samples were from GSE2109 while 109 ones from GSE5460, 147 ones from GSE1456, and 234 ones from GSE3494. The second step was removing batch effect by empirical Bayes method because the samples were from different labs.After that, all expression values were log2 transformed. Genes were filtered as following step:Variation across samples:0.5< Standard deviation/Mean...
Keywords/Search Tags:Breast cancer, Support vector machines, Gene set enrichment, Histologic grade, Human embryonic stem cell
PDF Full Text Request
Related items