Font Size: a A A

Identification And Function Analysis Of Multiple Sclerosis Related LncRNAs Based On GWAS And RNA-seq Data

Posted on:2021-04-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z J HanFull Text:PDF
GTID:1480306107478874Subject:Chemical Engineering and Technology
Abstract/Summary:PDF Full Text Request
Multiple sclerosis is one of the immune-mediated neurodegenerative diseases and is characterized by the inflammation and demyelination in central nervous system(CNS).By 2013(the latest figures available),the estimated number of individuals with multiple sclerosis has reached approximately 2.3 million worldwide.Recently,the long noncoding RNAs(lnc RNAs)are discovered as the important regulatory factors for the pathogenesis of multiple sclerosis,and a few of multiple sclerosis related lnc RNAs(e.g.linc-MAF-4,Ne ST and lnc-OPC)and their functions in multiple sclerosis have been identified by the molecular biological technologies.On the other hand,the previous genome-wide association studies(GWAS)have identified a large amount of single nucleotide polymorphisms(SNPs)which are significanty associated with multiple sclerosis.However,although the etiology of multiple sclerosis has been extensively explored,the potential key factors contributing to pathogenesis of multiple sclerosis involving genetic variants and transcriptional regulation in genomic non-coding regions remain poorly understood.Moreover,compared with other complex diseases(such as cancer),the identified multiple sclerosis related lnc RNAs and their functions may not go far enough.Therefore,in this study,we first performed a systematic research to explore the influences of the SNPs on lnc RNAs for multiple sclerosis among the genome-wide scale.Then,based on the known SNPs significantly associated with multiple sclerosis and the principle of linkage disequilibrium(LD)theory,we provided an analysis process to predict the noval potential multiple sclerosis related lnc RNAs.Next,using an expression-based meta-analysis,we tested the above results by integrating the numerous RNA-seq data from various studies.Finally,we analyzed a multiple sclerosis related single-cell RNA-seq(sc RNA-seq)dataset to explore the function and distribution of these lnc RNAs in the oligodendrocytes.The details are as follows:Firstly,based on a RNA-seq dataset,a bioinformatics strategy was applied to obtain lnc RNAs expression and SNPs genotype data simultaneously from 142 samples(51 multiple sclerosis patients and 91 controls)among the genome-wide scale,and an expression quantitative trait loci(e QTL)analysis was then conducted.In total,2383 differentially expressed lnc RNAs were identified as specifically expressing in brain-related tissues,and 517 of these lnc RNAs are affected by 1054 SNPs located within or around them.Next,the functional characterization and disease specificity of the cis-e QTL SNPs,as well as the secondary structure changes of the corresponding lnc RNAs were assessed.The results showed that these cis-e QTL SNPs are substantially and specifically enriched in the intergenic region and the neurological diseases(including multiple sclerosis)related SNPs sets,and they can also significantly alter the secondary structure in approximately 17.6% of these lnc RNAs whose expression is affected by them.Secondly,based on the conclusions above,we used the known multiple sclerosis related SNPs identified by GWAS and the principle of linkage disequilibrium theory to predict the novel potentially multiple sclerosis related lnc RNAs.We selected 12,025 multiple sclerosis related SNPs from four authoritative databases,and identified 111,581non-coding SNPs which are in strong linkage disequilibrium with them using the Haplo Reg tool.Then,according to our previous findings,we annotated these SNPs by the ANNOVAR software,and removed 49,218 non-intergenic SNPs from them.Among the remaining non-coding SNPs,2,855 of them are located in the sequence of 1,430 lnc RNAs transcripts.Finally,we assessed the influence of these SNPs on secondary structure of the corresponding lnc RNAs by calculating the minimum free energy changes.The results show that a total 438 SNPs strongly affect the secondary structure of 374 lnc RNAs.We we defined them as the multiple sclerosis related candidate lnc RNAs.Thirdly,we tested the relationship between these candidate lnc RNAs and multiple sclerosis by integrating the large scale RNA-seq data,and further explored their functions in multiple sclerosis.We selected all the multiple sclerosis related human RNA-seq data from three authoritative databases,and calculated the expression of lnc RNAs in each sample.After quality control,we retained 59,428 quantified lnc RNAs transcripts which contain 173 candidate lnc RNAs transcripts.Then,we performed an expression-based meta-analysis to integrate these data,and conducted a differential expression analysis.We found that there are 35 candidate lnc RNAs transcripts significantly differentially expressed between multiple sclerosis cases and controls.They are defined as the potential multiple sclerosis related lnc RNAs.The results of the hypergeometric distribution-based assessment show that our analysis process is effective to identify the potential multiple sclerosis related lnc RNAs.Finally,by performing the weighted gene co-expression network analysis,neighboring gene and e QTL analysis,as well as gene set enrichment analysis,we found that the functions of these potential multiple sclerosis related lnc RNAs may be involved in the regulation of fatty acid and steroid metabolism.Fourthly,we further explored the functions and distribution of these potential multiple sclerosis related lnc RNAs in the oligodendrocytes using a mouse sc RNA-seq dataset.By quantifying the expression of these lnc RNAs and performing the hierarchical clustering and t SNE-based cell subset analysis,we found that the oligodendrocytes can be divided into four cell subsets based on the lnc RNAs expression.Then,we performed a differential expression analysis of the lnc RNAs in each oligodendrocyte cell subset,and conducted an lnc RNAs sequence homology analysis between human and mouse.The results show that there are 5 of these potential multiple sclerosis related lnc RNAs significantly down-regulated in three oligodendrocyte cell subsets.Finally,by combining the results of the gene set enrichment analysis,we speculated that the functions of these potential multiple sclerosis related lnc RNAs in the oligodendrocytes may be involved in the fatty acid metabolism and myelin sheath formation.In summary,this study provided an analysis process,and identified the noval potential multiple sclerosis related lnc RNAs by it.Moreover,this study is the first to explore the influences of the SNPs on lnc RNAs for multiple sclerosis among the genome-wide scale.In addition,this study is the first to explore the characteristics of multiple sclerosis related lnc RNAs in the oligodendrocytes.Our fndings will be beneft to improve the understanding of multiple sclerosis pathogenesis.
Keywords/Search Tags:Bioinformatics, Multiple sclerosis, LncRNAs, SNPs, RNA-seq
PDF Full Text Request
Related items