| In the era of "big data",mining valuable knowledge from large amounts of data is becoming increasingly important.Although machine learning has achieved great success in various research areas,it heavily relies on the representation of features,as feature selection and construction often requires in-depth expert knowledge.Recently,the ability of deep learning to automatically learn inherent data features without guidance from experts has been demonstrated.The developments of big data,parallel and distributed computing techniques have led to the widespread application of deep learning.In the post-genomic era,the rapid developments in high-throughput sequencing technology have accumulated a large scale of omics data in biomedical researches.Hence,many studies have utilized omics integration methods to gain insights into biological mechanisms and molecular characteristics.However,omics data integration still faces a series of challenges including biological background differences,batch effects,data normalization and dimensionality reduction,etc.To facilitate the development of omics integration,it is essential to address the issue of multimodal data fusion and develop scalable,high-throughput,and user-friendly frameworks.Therefore,this research conducts omics data mining with deep learning from two aspects: integration of cancer multi-omics data and integration of various data types in proteomics,respectively.Our research architecture are as follows.(1)In the first place,a multi-omics integration model based on graph convolutional network(GCN)named Mo GCN was developed for cancer subtype classification and analysis.After pre-processing the data with autoencoder and similarity network fusion methods respectively,vector features and patient similarity network(PSN)were input into the graph convolutional neural network for training and testing.This pipeline has achieved the best performance on the breast cancer dataset and the validation pan-kidney cancer dataset from TCGA.The case study for breast cancer also elucidated that the features captured by Mo GCN could reveal the molecular characteristics of cancer subtypes and the patient similarity network can provide intuitive judgements for clinical diagnosis.The findings confirmed that Mo GCN has great potential for heterogeneous multi-omics integration,marker identification,and clinical diagnosis.(2)Extracellular matrix(ECM)is a complex scaffold surrounding the cells,providing mechanical support and biochemical signals to cells and tissues,and it plays an essential role in their structural and functional integrity.The ECM proteins can direct cell adhesion and migration,as well as cellular growth,metabolism and differentiation signals.This study simultaneously integrated protein domains,physicochemical properties and protein sequences to develop a deep neural network model-based ECM prediction tool,ECMPride 2.0,and construct a web-based database,ECMPride DB(http://ecmpridedb.hupo.org.cn/).This database provides a user-friendly web interface for browsing,searching and downloading all potential ECM components,as well as abundant biological annotations.This study can serve as a valuable reference resource for ECM investigations and contribute to discovering and validating of new human ECM proteins.(3)Finally,we used a decellularization method combining quantitative proteomics approaches to construct a time-resolved extracellular matrix atlas of the developing human skin dermis.We elucidated the molecular differences and developmental characteristics of the dermal matrix during skin aging.In addition,the age-specific functions of different types of ECM proteins were systematically analyzed,which provides a comprehensive understanding of development,aging and damage repair in skin tissues.To sum up,this study systematically conducted deep learning-based omics data integration and downstream biological analysis.As a result,the proposed cancer multi-omics data integration model Mo GCN and human extracellular matrix protein prediction model ECMPride2.0 both achieved the best prediction performance,providing a new perspective on integrating heterogeneous biological big data with deep learning methods.Furthermore,the time-resolved ECM atlas constructed in this study identified the ECM components with different types in toddlers,teenagers,adults and the elderly,providing a new standard of age-specific skin ECM composition to discover new clues regarding tissue engineering for skin regeneration. |