Font Size: a A A

The Inferences Of Biological Entities Interactions Based On Graph Subspace Ensemble Learning

Posted on:2021-04-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y J MaFull Text:PDF
GTID:1360330605464311Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
All biomolecules are in various complex systems,and any biological function is completed by many biomolecules.Studying the nature and function of biomolecules from the network perspective is an important research direction of current bioinformatics.With the rapid development of emerging biotechnology and database technology,massive omics data is generated,which provides researchers with abundant data sources for exploring and revealing various life activities of living bodies at the molecular level,and also makes it possible to construct interaction networks among different biological entities.Integrating multi-omics data to construct biological networks and inferring the interactions between different biological entities is an important research topic in network biology,which can reveal various cooperation mechanisms between molecules,help researchers understand the functions of biological molecules,and deepen the understanding of the occurrence,development and activity of various complex diseases.However,in the face of huge amounts of biological data,relying only on biological experiments to explore interactive relationships will not only consume a lot of human and material resources,but also take a long time.As the core of artificial intelligence,machine learning plays an important role in all walks of life.With the development of computer technology and statistical theory,a large number of machine learning models have been proposed.Guided by statistical theory,using machine learning models and computer technology to solve biological problems is an important research method for bioinformatics at present.Using data mining,machine learning and other methods as an aid and guidance for biological experiments,preliminary screening can be performed quickly from massive data,greatly reducing the time and resource costs of experiments.Aiming at the problems in the current biological entity interaction inference model,this paper studies the construction of networks based on multi-kernel neighborhood similarity,interactive prediction models based on multi-network fusion and neighborhood bidirectional propagation,and graph-based heterogeneous network sparse subspace ensemble learning model,and apply these proposed machine learning models to three important types of biological interaction prediction problems.The specific work is as follows:Considering the assumption that non-neighborhood samples may contain main information and that the samples may have non-linear structural relationships,a network construction model based on the multi-kernel neighborhoods similarity(MKSNS)is established.By setting different regularization weights on the neighborhood samples and the non-neighborhood samples,the problem of the complete loss of important non-neighborhood information caused by the traditional linear neighborhood model is avoided;The introduction of the kernel method enables the model to flexibly respond to various Situation;The establishment of a multi-kernel model further broadens the scope of application of the model and reduces the complexity of the model in selecting kernel functions.Experiments show that on five feature datasets of the lncRNA-protein interaction inference problem,MKSNS has achieved better prediction results for most evaluation indicators.Aiming at the phenomena that most current interactive inference models cannot effectively utilize multi-source information,rely too much on known interactive networks,have poor parameter robustness,and have limited predictive ability on isolated samples without interactive information,this paper proposes an interactive prediction model framework based on multi-network fusion and neighborhood bidirectional propagation(MNF-NBP).Most of the biological network inference problems have the characteristics of small proportion of interactive samples and no definite interactive samples,and a single data source often cannot contain complete information due to the limitation of technical conditions.In order to obtain a more accurate biological network structure and avoid over-dependence on known interactions,the model builds the network by integrating multi-source information and fusing heterogeneous network data to effectively eliminate network bias caused by information missing from a single data source.In addition,the model utilizes the network completion strategy to reduce the errors caused by sparse interactive networks,and proposes a neighborhood bidirectional propagation model to ensure the exchange of heterogeneous network information.The experiments on miRNA-disease interaction inference show that the MNF-NBP model can not only predict unknown interactions effectively,but also predict isolated samples without any interaction information.For the most of subspace learning models can not effectively integrate multi-source features and network structure,can not reflect the importance of labeled samples while mining the information of unlabeled samples,and can not effectively integrate the information of heterogeneous data sources,this paper proposes a graph-based heterogeneous network sparse subspace ensemble learning model(GHNSSL).The model uses the sample label information hierarchically by setting the importance level,and uses the neighborhood laplacian regularization operator to ensure the smoothness of the subspace features,and constructs the sparse subspace learning model of heterogeneous network to integrate various original feature information,and a weighted K-neighborhood feature completion strategy is proposed to extend the prediction performance of the model for isolated samples.Experimental results on two interaction inference problems(lncRNA-protein interaction inference and viral-human protein interaction inference)show that the model has good prediction performance for new interactive and isolated samples,and it is quite robust to noisy interactive networks.The models proposed in this paper takes biological problems as the main research object,and has achieved good results for the prediction of biological interaction relationships.In addition to that,the proposed multi-kernel neighborhood simlarity network construction model,multi-network fusion and neighborhood bidirectional propagation model,and graph-based heterogeneous network sparse subspace ensemble learning model has certain theoretical value,and these methods are also applicable to related research in other fields.
Keywords/Search Tags:Biological networks, Multi-kernel neighborhood similarity, Network fusion, Neighborhood bidirectional propagation, Sparse subspace learning, Ensemble Learning
PDF Full Text Request
Related items