Font Size: a A A

Research On The Key Technologies Of Multi-source Biological Data Integration And Mining

Posted on:2020-04-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y GuoFull Text:PDF
GTID:1480306740972909Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid advance of the next-generation high-throughput sequencing technologies,the cost of biological sequencing has sharply declined nowadays and various large-scale biological sequencing data have been produced at present.The enrichment of biological big-data provides good opportunities to study useful biological knowledge comprehensively by using computational techniques.Since the complexity of biological system,the completion of most life activities includes the cooperation of biological elements at multiple levels.It is hard for most computational methods based on single-source data to reveal and understand the complex biological system comprehensively.In recent years,with the enrichment of various biological sequencing data,integrating multi-source biological data to comprehensively study and mine complex biological knowledge has become one of research hotspots in bioinformatics.In this dissertation,we aim to reveal and understand the relevant biological mechanisms in human diseases by integrating multi-source biological data.Specifically,towards the computational problems in human disease analysis,we focus on developing data integration models and mining methods in the scopes of biological network mining,cancer subtyping and the prediction of regulatory associations in the post-transcriptional RNA alternative splicing(AS)by integrating different types of biological data,respectively.The main content and contributions of this dissertation are as follows:(1)For the existing static protein-protein interaction(PPI)network cannot consider the temporal and spatial information of protein interactions,we propose a novel method to construct dynamic protein interaction networks by integrating multi-source biological data.Based on the constructed dynamic protein interaction networks,we propose new methods to mine the protein complexes(CBMI)and functional modules(HFMD)in the dynamic view respectively.By integrating the static PPI network,time-series gene expression and protein subcellular data to construct a series of dynamic PPI networks,we discriminate the protein complexes and functional modules in biological function and network structures from dynamic interaction view and use the proposed methods to detect them in dynamic networks.Systematic experiments show that the proposed CBMI method can detect more accurate protein complexes and the HFMD method can identify more biologically meaningful protein functional modules.(2)For most traditional module detection methods are not accurate enough to identify hybrid regulatory modules from heterogeneous biological networks,we propose a novel method(d HMR)to detect hybrid regulatory modules from two-class heterogeneous biological networks.The method first considers the distributions of different types of interactions in networks and predicts the module affiliation possibilities of each interaction according to the generative model of stochastic network.Then,it detects hybrid modules by dividing the whole network based on the predicted module affiliation information of interactions.Systematic experiments show that the proposed method can detect more accurate hybrid regulatory modules.(3)For existing cancer subtyping methods need to be further improved,we propose three different methods that integrate different types of biological data to perform cancer subtyping.For most existing methods cannot consider the similarity inaccuracy across samples and contribution weights of different data sources,we propose a similarity regression fusion(SRF)method to predict cancer subtypes by integrating multi-omics data.The method corrects the similarity information across samples in each data-view by incorporating multi-omics data and considers the contribution weight of each data-view in data integration.For most existing methods cannot consider the associations across data features in multi-source data,we propose a novel method(CSPRV)to integrate multi-omics data and heterogeneous biological regulatory networks to perform cancer subtyping.The method extracts multiple complex data features from heterogeneous biological regulatory networks and then predicts cancer subtypes based on the extracted data features.For the high-dimensional challenge of biological data and the effect of data platform noise in data integration,we propose a hierarchical deep learning method(HI-SAE),which uses unsupervised Autoencoder deep neural network to learn the low-dimensional representations of input data,to learn integrative features from multi-source data,and thus to perform cancer subtyping based on the integrative features.Experiments based on different types of biological data show that the proposed methods could identify more clinically meaningful cancer subtypes than the state-of-the-art methods.(4)For most existing methods are not accurate enough to predict the regulatory associations in the post-transcriptional RNA alternative splicing(AS),we propose a novel statistical method(RMAS2)to predict the expression correlations between AS factors and AS events based on read counts.Because the depth of sequencing data usually affects the estimation accuracy of AS level,there exist uncertainty in AS level estimation.RMAS2 method uses the read counts of AS events and the expression levels of AS factors as input data to predict the correlation associations between them directly.Systematic experiments show that the proposed method is robust to outliers and can accurately predict AS correlation associations in AS regulation.In this work,we focus on developing the integration models and mining methods of multi-source biological data on the applications of biological network mining,cancer subtyping and association analysis in AS regulation.We propose a series of data integrating models and mining methods towards different bioinformatics problems and show their effectiveness according to systematic experiments.We hope these methods would provide useful insight and research support to further elucidation of the biological mechanisms in human disease progression and therapy.
Keywords/Search Tags:Multi-source biological data, Data integration, Biological network mining, Cancer subtype prediction, Alternative splicing regulation
PDF Full Text Request
Related items