Font Size: a A A

A Deep Neural Network Model Integrating Protein Interactions For Prioritizing Cancer-related Proteins And Drug Target Combination

Posted on:2019-09-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:J W ChangFull Text:PDF
GTID:1360330548953419Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Mining valuable information from big datasets and describing complex system using elegant and precise models are always be the vital objectives of the scientific research.There are two separate methods to achieve these goals in scientific area: data mining and complex network.These two methods have same objective and have many similar applications.But rarely people use both methods in one study.However,usage of both complex network and data mining methods can provide a new and better strategy for big data analysis.In this study,we use complex network and data mining approaches to analyse two different biological datasets and get very meaningful results.The works presented in this article demonstrate the integration of complex network and data mining.Decades research of cancer produced valuable and a big amount of data.The identification of pathways and key genes involve in cancer by using these datasets is the prime objective of biologists and researchers working on cancer.Many methods were developed and used to analyse these datasets but every method has its limitations and to overcome these limitations is the challenging task for bioinformaticians.Methods integrating with protein interaction network and other omics data get a lot of attention these days,as protein interactions play an important role in cell signal transduction,cell adhesion and gene regulation etc.In this article we used the method of protein interaction network together with gene expression and gene essentiality,to prioritize cancer related genes and protein combinations,to provide a new strategy of complex network and data mining combination.There are two subjects in this article:(1)Using protein-protein interaction network,integrating gene expression of different samples to prioritize cancer related genes and gene combination.Protein-protein interaction network is a typical complex network,in which each edge represents one interaction of a pair of proteins.The gene expression datasets usually contain expression profiles in cancer tissues,cancer cell lines and normal tissues.Comparison of cancer cells and normal cells will provide key information of functional genes involved in cancer.In this article we use the protein interaction network to build a sparse auto-encoder,then trained the auto-encoder using differential expression data of cancer cells and normal cells.After training,the auto-encoder was used to build a deep model,which calculated the knockdown effect of genes.We used the network to record the knockdown effects of genes,then use the network to choose key cancer related genes.This study produced 500 high confidence cancer related proteins,among these proteins there were 211 known cancer drug targets were found which supports the accuracy of out method.Gene ontology enrichment of rest of genes demonstrated the strong function relationship with cancer.Comparing to other reported methods,this model got high AUC value(>0.8).The network of knockdown effect could be used to identify protein combination.In this article the protein combination could be synthetic lethal pair and could be drug target combination.There is a special group of proteins involved in protein combinations in the network of knockdown effect,which have close relationship with proteins in combination pairs.We used the known protein combination to select these proteins and then used them to identify new protein combinations.Ten-fold cross check demonstrated this method had high accuracy in combination prediction(>0.85)and could be used to find new protein pairs.At last we apply this model to single cell sequencing date of prostate cancer cells in which we found new and known prostate cancer genes.As the single cell sequencing could detect the evolution of cancer cells,so the results of this model implies the future applications in cancer therapy and researches.(2)Mining complex gene-gene function relationship using protein-protein interaction network and gene essentiality profiles.In this subject the gene essentiality profiles were produced by CRISPRi,which change the DNA sequence of cell lines using a sg RNA library.The gene essentiality value represents the differences of sg RNA abundance between cells after a time period and initial cells.In this article the linear model was used to transform the gene's essentiality to protein interaction's essentiality,which subsequently was used to calculate the correlation of protein interactions.Using the correlation of protein interactions,we found new relationship between genes,in which cytokine acted as hub of correlation network.As cytokine signaling is one of most important pathway in leukemia cell,these related interactions provide new information for further research.This study also sorted and selected essential protein interactions,and the sub-network of key gene provide clear clue of functional interactions.At last,using essential protein interactions the connection between up-regulated genes and frequent mutant genes was determined as promising way for cancer gene analysis.These two methods used the protein interactions and omics data to build the data mining model,which have been proved to be useful for cancer related gene analysis.Works in this article demonstrate the integration of complex network and data mining could provide useful tools for biological data analysis.
Keywords/Search Tags:cancer gene, auto-encoder, deep learning, gene essentiality, protein interaction
PDF Full Text Request
Related items