Font Size: a A A

Data Analysis And Visualization Platform For Breast Cancer Genomics Data

Posted on:2018-07-29Degree:MasterType:Thesis
Country:ChinaCandidate:J M ZouFull Text:PDF
GTID:2334330542961636Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Breast cancer is a highly genetical and clinical heterogeneous disease.The effectiveness of a particular treatment varies among different breast cancer patients.There are several widely accepted methods to divide breast cancers into different subtypes,such as histopathological classification based on morphological features,and the expression level of immunohistochemical biomarkers,such as ER,PR and HER2.The differences in gene expression patterns in these subtypes reflect changes in the cell biology of the tumor and are associated with significant variation in clinical features such as overall survival and disease free survival.Based on the above problems and methods,this paper has carried out a series of research work on tumor molecular typing and related data analysis,the main contents are as follows:(1)We first developed a method for classifying breast cancer patients based on somatic mutations data.We analyzed the whole of exome sequencing data in breast cancer patients provided by TCGA.First,an algorithm called CADD was used to assess the effect of each gene mutation to the biological function.A subset of mutant genes was extracted by feature selection method.Based on these genes and CADD scores,all of the samples were divided into three categories by non-negative matrix decomposition method,and the correlation between the three categories and the related clinical features of the patients was further evaluated.The results showed that those selected features were more significant in the clinical classification of tumor patients.(2)In order to help researchers use more multi-omics data information and help them to carry out the identification of tumor biomarkers and the related data analysis,our team developed a visualization platform for breast cancer genomics data.The database was constructed by integrating the multi-omics data and the clinical features of the breast cancer patient,providing a search function for individual genes,filtering function to various datasets,real-time analysis to transcriptome and CNV data,and the visualization to miRNA,KEGG biological pathway and gene function network three types of data.In our work,we not only focus on the attempt to the classification of breast cancer patient based on somatic mutation data,but also developed a useful platform for the researchers to carry out the data integration,data analysis and visualization.The platform can help researchers get insight into breast cancer molecular subtyping and biomarker discovery,and explore individualized treatment scheme to deal with tumor heterogeneity between different patients.
Keywords/Search Tags:breast cancer subtyping, TCGA, data mining, nonnegtive matrix factorization, data visualization
PDF Full Text Request
Related items