Font Size: a A A

Identification Of Differential Expressed Genes And Differential Networks Based On Microarray Data

Posted on:2019-02-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:NAPAGODA ARACHCHIGE DISNA NAMAFull Text:PDF
GTID:1360330548971470Subject:Mathematical Statistics
Abstract/Summary:PDF Full Text Request
Identifying differentially expressed genes and their interactions across different states using robust and accurate feature selection methods play an important role in disease diagnosis and prognosis.The analysis of genomic data with vast dimensionality and small sample size is a major challenge.Several mathematical and statistical methods have been de-veloped to detect differentially expressed and co-expression relationship of genes in high dimensional data.This dissertation presents results from two research studies.The first study describes the developed statistical model for identifying differen-tially expressed genes in microarray data and the second study is to em-phasize the gene co-expression based on new statistical model.Currently,most methods concentrate on gene-by-gene basis within a parametric hy-pothesis testing framework to determine whether a gene is differentially expressed or not to microarray cancer.In this research,we present sev-eral standard statistical machine learning techniques to analyse of high dimensional genetic data.In each data set,genes cannot be used to di-rectly detect cancer-associated genes due to the presence of noise in the microarray data.Therefore,the newly developed statistical methods for the analysis of gene expression in microarray data are needed.In this study,we use Student's t statistic to analyse genes of publicly available breast cancer data.One of the main disadvantages of using t statistic is that it cannot be used to identify whether gene expression significantly differs or not in cancers across individual data sets because the same gene can have different t values in different data sets.Therefore,integrating one matrix by combining all t values in breast cancer data sets has been employed to overcome this problem.We develop an Integrated Multi-variate Group Sparse(IMGS)model based on the combined Student's t statistic of the independent multiple data sets,to identify differentially expressed genes of breast cancer.Furthermore,stability selection is ap-plied to identify the optimal values of tuning parameter in IMGS method.This study provides a few Meta-analysis methods and a comparative study of the IMGS model,Student's t statistic and GeneMeta model based on breast cancer data.According to the performances of the three methods,IMGS is a better method than Student's t statistic and Gen-eMeta methods.Even though all methods can identify a reasonable fraction of truly differentially expressed genes in high dimensional data,we conclude that the IMGS method is the more appropriate statistical approach to identifying the most significant genes for further biological and medical analysis of gene expression data.The second research is to develop a new statistical approach for identify-ing common and specific hub genes in the gene co-expression networks of pan-cancer data.Therefore,Integrated Differential Co-expression Group(IDCG)model has been developed to construct co-expression relation-ship of genes from multiple gene expression data.Furthermore,IDCG model can be employed to identify distinctive patterns of co-expression genes of pan-cancer data sets.Pearson's correlation is the well-known simple method used to measure correlations between gene expression and Fisher z-transformation is applied to create IDCG model as an ini-tial step.By integrating pan-cancer data sets,IDCG model has more potential to contribute to methodological base of statistical analysis for discovering biologically related genes in different cancers.Furthermore,stability selection is employed to identify the optimal values of tuning parameters in IDCG method.One of the most informative types of gene co-expression data is to achieve a better knowledge of functional analysis of hub genes in each cancer.Based on IDCG approach identifying those same functional genes should have a similar or correlated expression patterns.Furthermore,IDCG model point out underlying molecular mechanism in large-scale cancer studies beyond the simple visualization of patterns of differential co-expression genes.
Keywords/Search Tags:Co-expression network, Differentially expressed genes, Meta-analysis, Pan-cancer, Stability selection
PDF Full Text Request
Related items