Font Size: a A A

The Study Of The Relationships Between Epigenetic Modifications And Gene Expression In Human Embryonic Stem Cell

Posted on:2017-04-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:W X SuFull Text:PDF
GTID:1220330485466597Subject:Biology
Abstract/Summary:PDF Full Text Request
The expression of eukaryotic gene is a complex biological process, regulated by many genetic and epigenetic factors. With more and more comprehension about the growth, development, and disease, people focus on the epigenetic researches. High-throughput sequencing leads the epigenetic researches into the age of big data. The analysis of epigenetic big data depends on the bioinformatic methods. In this work, the high-throughput sequencing data of epigenetic modifications and gene expression in hunman embryonic stem cell are selected. The interaction network between histone modifications and gene expression is constructed. The differences of the histone modification profiles between highly and lowly expressed genes are analyzed. The effects of CG content on histone modification profiles are studied. A new machine learning method combines histone modifications, DNA methylation, etc. epigenetic modifications and trinucleotide composition with support vector machines is developed to classify the highly and lowly expressed genes.The main conclusions are listed as follows:1. The Pearson correlation coefficients between histone modifications and gene expression are studied, the results show that the conclusion that most of the histone modifications activate gene expression in this study. Based on partial correlation coefficients the interaction network between histone modifications and gene expression is constructed. In this network,11 edges showing higher positive correlation coefficient between histone modifications,7 histone modifications have direct correlation with gene expression, while other histone modifications indirectly control gene expression by interacting with the 7 histone modifications.2. DNA regions flanking the transcriptional start sites and promoter,5’UTR, exon, intron, and 3’UTR regions are very important to control gene expression. So the distributions of histone modifications between the highly and lowly expressed genes are compared in these regions. The results indicate that there are four types of distributions of different histone modifications in these regions, and the distributions of different histone modifications are very different between the highly and lowly expressed genes. The results show that histone modifications are mostly located in the promoters of highly expressed genes versus the exons of lowly expressed genes. The correlations between histone modifications in promoters of highly expressed genes are different from the correlations in exons of lowly expressed genes. From the boxplots of each histone modification in the five functional regions,we can draw the conclusion that the range of normalized read counts is the smallest in exons which have more stable chromosome state.3. Histone code is also very important to gene expression. So the histone codes in DNA regions flanking the transcriptional start sites of the highly and lowly expressed genes are studied. Total 5 histone modification clusters are found, and the histone modification clusters are different between the two types of genes. The correlations between different bins for each histone modification are calculated. The results show that the correlations between the bins overlapped by the peaks of histone modifications are higher than the correlations between the bins non-overlapped by peaks. Therefore, the histone modification peaks can be recognized by recognizing the regions where this kind of histone modification has higher correlations.4. The type specificity and regional bias of histone modifications for 11 key transcription factor genes that are critically important to stem cell renewal are researched. The type specificity and regional bias of different histone modifications are all different among these genes. In which, H3K4me2 and H3K4me3 are the two most important modifications, and the two modifications both prefer to locate in promoters.5. CG content has effect on histone modification profiles, so the promoters are seperated based on CG content, and histone modification profiles of high CG content and low CG content promoters are studied. The results show that most of histone modifications have more locations in high CG content promoters. Each type of promoters has two different histone modification clusters, and there is a conservative cluster which contains 7 histone modifications between the two types of promoters. Most of the key transcription factor genes that are critically important to stem cell renewal are high CG content promoters. H3K4me2, H3K4me3 and H3K36me3 are the three most important modifications for these genes.6. A new machine learning method combines histone modifications, DNA methylation, DNA accessibility, transcription factors, and trinucleotide composition with support vector machines is developed to classify the highly and lowly expressed genes. With the addition of the information parameters, the predictive accuracy of the model is improved. The model containing all information parameters is the best model. The predictive accuracy and Matthews correlation coefficient of the best model are as high as 95.96% and 0.92 for 10-fold cross-validation test, and 95.58% and 0.92 for independent dataset test, respectively. Our model provides a good way to judge a gene is highly expressed or lowly expressed gene by using genetic and epigenetic data.
Keywords/Search Tags:epigenetic modification, highly expressed gene, lowly expressed gene, high CG content promoter, low CG content promoter, support vector machine
PDF Full Text Request
Related items