Font Size: a A A

Research On Key Technologies Of Intelligent Analysis For Biological Omics Big Data

Posted on:2022-07-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:K W TanFull Text:PDF
GTID:1480306569958999Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advancement of various sequencing technologies,the development of medical image generation and analysis technologies,and the establishment of several large genome sequencing institutions.A large amount of molecular-level omics data and medical image data with unified standards have been generated and made public.The life science field has entered the era of big data characterized by massive multifaceted omics data.Intelligent analysis and mining for biological omics big data is an important initiative to promote the progress of precision medicine,but the characteristics of biological omics big data also pose great challenges and requirements for key technologies of intelligent analysis.The problems of combinatorial explosion,curse of dimensionality,interpretability and data heterogeneity in biological omics big data hinder the application of existing intelligent analysis technologies in the field of precision medicine.The study of intelligent analysis algorithms for biological omics big data will help to quickly and accurately mine the biomedical knowledge contained in the omics data.Therefore,this thesis addresses the key issues of intelligent analysis,such as singleomics feature identification,single-omics data representation and multi-omics data representation and fusion,and applies them to biomarker identification and disease diagnosis and prognosis prediction.The main research contents,innovations and contributions of this thesis include the following.(1)To alleviate the combinatorial explosion problem in higher-order SNP identification,this thesis proposes a discrete invasive tumor growth algorithm,called DITGOssi(Discrete Invasive Tumor Growth Optimization SNP-SNP),based on a novel swarm intelligence optimization algorithm invasive tumor growth Interaction.In DITGOssi,a discrete invasive tumor growth algorithm DITGO is firstly designed to address the discrete representation and time-sensitive requirements in higher-order SNP recognition.Then a two-stage search strategy is used to further enhance the global search capability of DITGO in higher-order SNP recognition tasks.Experiments show that DITGO has certain advantages over traditional swarm intelligence optimization algorithms for the higher-order SNP identification task,but both traditional swarm intelligence optimization algorithms and DITGO cannot handle SNP data without marginal effects well.Compared with the common higher-order SNP identification algorithms,DITGOssi achieves more significant performance improvement and outperforms DITGO in both marginal and non-marginal SNP data.(2)To alleviate the problems of curse of dimensionality and interpretability in transcriptomics data representation learning,this thesis designs a transductive semi-supervised learning method based on graph convolution neural networks called Hierarchical Graph Convolution Network(Hi GCN).The method considers both sample interactions in the sample space and feature interactions in the feature space,and learns a better representation of transcriptomics data by simultaneously aggregating neighborhood information in both spaces.Over smoothing is a common problem in graph convolution neural network,and the problem becomes worse when information is aggregated in both spaces simultaneously.Therefore,Hi GCN designs a feature weighting layer to alleviate it.The networks that perform information aggregation in feature space and sample space are called sparse graph convolution neural networks and feature-weighted graph neural networks,respectively.In addition,HiGCN can provide important features related to the prediction target,thereby providing interpretability to the prediction results of the model.Experiments on disease typing and survival analysis show that Hi GCN has better representation ability and interpretability,and can perform more accurate disease typing and survival analysis on both simulated and real datasets.(3)To alleviate the problems of curse of dimensionality and data heterogeneity in multimolecular omics data fusion,this thesis proposes a Multi-Omics Supervised Auto Encoder(MOSAE)model for multi-molecular omics data fusion.First,MOSAE is designed to adapot different single-omics data with an omics-specific autoencoder.Then,considering that general autoencoder perform representation learning with unsupervised manner,but supervised information is crucial for the representation of omics data,two types of supervised information are incorporated into MOSAE.The combination of supervised autoencoder and omics-specific autoencoder is able to force MOSAE to learn both task-specific and omics-specific representations.Experiments on four clinical endpoint prediction shows that MOSAE has better performance in fusing multi-molecular omics data and is able to perform more accurate clinical endpoint prediction.(4)To alleviate the curse of dimensionality and more serious data heterogeneity problem in molecular and image omics data fusion,this paper proposes a multimodal fusion framework,called Multi Co Fusion,based on multitask correlation learning.First,molecular omics(i.e.,transcriptomics)data and image omics(i.e.,pathomics)data are represented separately using Res Net-152 and sparse graph convolution network.Subsequently,the representations of the two types of data are fused by concatenation and fed into a feedforward neural network to learn fused representations and multi-task shared representations.In Multi Co Fusion,multi-task learning is performed by alternate training.Experiments on grade classification and survival analysis show that the Multi Co Fusion framework has better performance in fusing transcriptomics and pathomics data,the two types of tasks in the framework are strongly correlated,and the multi-task learning strategy can significantly improve the performance of both survival analysis and grade classification.
Keywords/Search Tags:omics big data, swarm intelligence optimization, graph convolution networks, autoencoder, multi-task learning
PDF Full Text Request
Related items