Font Size: a A A

Interpretation And Analysis Of The LINCS Biological Big Data

Posted on:2016-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:X Y YinFull Text:PDF
GTID:2310330536967243Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Genome-wide gene expression profiles quantify all the genes from the transcriptome level through micro-array techniques and thus establish a connection between the information encoded in the DNA and the phenotype of living body.All these enabled us to measure the transcriptional status of a cell or organism under certain experimental conditions and specific experiment time from the aspect of global view and analyze the differential expression of all genes after exposed to given perturbagens.Moreover,we can measure the similarity of gene expression profiles of different phenotypes by comparison of the corresponding signature gene sets.For example,if the expression profiles of different phenotypes are similar to each other,the induction factors or the perturbagens of them are potential to exaggerate when exposed to both of the perturbagens.While if the expression profiles are inverse to each other,the induction factors are likely to be the inhibitor of another,and thus can be used as the therapy to reverse and recover the cellular status.In consideration of the above mentioned signature gene sets matching theory,CMAP data has been widely used in the area of drug repurposing,lead discovery,analysis of mode of action and so forth.But the limitation in the amount of data inherited from the low diversity of cell lines,chemical compounds,experimental doses and experimental times makes the analysis powerless.The LINCS data which was generated by the same group,the Broad Institute,was born in response to the needs of times.They have got more than 130,000 gene expression profiles in 77 typical cell lines under the perturbation of more than 4,000 gene silencing reagents and over 7,000 chemical compounds.We gave a brief introduction and elementary analysis to the LINCS biological big data since the Broad institute did not give detailed meta information,including the data format,data source,data usage and how to get access of the data.What's more,we introduced the methods in gene expression profile analysis and elaborated the one named GSEA which is widely used and of good performance.To demonstrated that LINCS data surely brings about information gain to the known knowledge,we constructed the gene-based perturbation relationship network based on the subset data of gene silencing,analyzed the proportion of genes with significant perturbation relationship whose corresponding proteins interact at the same time.Further,we pointed out that the known data of gene ontology biological processes and Kyoto Encyclopedia of Genes and Genomes signaling pathways covered only parts of the knowledge extracted from the gene expression profiles by assess the proportion of significantly perturbed gene pairs in the same pathway or in different pathways but share at least one gene as a link.Next,based on the hypothesis of homologous genes lead to similar effects when being silenced with inference,we analyzed the gene silencing data in cell line HEPG2 by calculating the similarities and got a similarity matrix and clustered the genes with Affinity Propagation algorithm.It was identified that the genes in the same cluster tend to be more similar than those indifferent clusters when being treated the gene silencing reagents.Finally,we conducted functional annotation to genes in the same cluster on the gene ontology website to figure out the functional enrichments of genes in different clusters and predicted potential biological processes in which the unannotated genes might participant.
Keywords/Search Tags:gene expression profile, differentially expressed genes, gene expression signature, signature gene set, gene set enrichment analysis, LINCS data
PDF Full Text Request
Related items