Hepatocellular carcinoma(HCC)is widely distributed in China,an estimated 400,000 new liver cancer cases would occur in China.The mortality rate of HCC is very high,the average cancer deaths of HCC each year is about 383,000 only in China,accounting for the 51% cancer deaths of HCC all over the world.Metastasis is the main cause of death in patients with liver cancer,therefore,it is of great significance to study the molecular mechanism of HCC in the process of invasion and metastasis.The occurrence,development and metastasis of hepatocellular carcinoma are related to the complex regulation of the gene itself,transcription,translation and posttranslational modification,and the number of genes involved is numerous.With the rapid development of various omics techniques,the acquisition and analysis of biology data at different levels and types are becoming more and more mature,making a chance for the systematic study on molecular mechanism of liver cancer metastasis.Multi-omics data integration analysis study involves multidisciplinary crossover,which is prone to face two-sided problems.On the one hand,the development of multi-omics data integration analysis tools focuses on the optimization of algorithms and software,while the comprehension of biological problems is not deep enough and these tools lack the application of biological problems.On the other hand,the multi-omics data integration analysis study of disease mainly focuses on biological issues,while the involved levels of multi-omics data are too few,and the integration of multi-omics data is too simple.In view of the above problems,the model of different metastatic potential hepatocellular carcinoma cell lines was used as the research object in this study.By applying the integration of the multi-omics data,a complete multi-omics data integration process which starts from the original omics data was established.By using this process,we can systematically investigate the change rules of HCC cell lines at the gene level,transcription level and protein level.The metastatic potential cell lines of HCC have the same genetic background,and with the increase in metastatic potential of HCC cell lines,the system of model changed at the gene level,transcription level and protein level.We thought that these changes would show a trend and directionality with the increase in metastatic potential.To prove this trend,we used the multi-omics data integration analysis process to perform the integration analysis of the model system.The original data of the metastatic potential cell lines of HCC were analyzed by the original data processing and the quality control analysis process,including genetic variation detection analysis,quantitative analysis of gene expression abundance and quantitative analysis of protein expression abundance.A total of 224 mutant genes were identified at the gene level.17032 genes were quantified at the transcriptional level.5654 proteins were quantified at the protein level.There were 5405 genes / proteins quantified in both transcriptional level and protein level.In addition,through the integration analysis,we proved that the multi-omics data of metastatic potential cell lines of HCC showed a directional change with the increase in metastatic potential.Furthermore,we screened 227 candidate genes / proteins that were significantly associated with the enhanced metastatic potential of HCC cell lines through the principal component analysis and screening differential expression genes / proteins method.Based on the correlation between the candidate genes / proteins expression pattern,227 candidate genes / proteins can be divided into three modules.One of the modules was significantly enriched in biological processes such as the cell cycle progression,spindle filament formation,chromosome separation,and the proteins were distributed in the chromosome centromere,kinetochore,microtubule protein and other regions where the cell cycle and movement closely related to.Thus indicated this module might play a role in promoting cell proliferation and raising the ability of migration during the progressive increase of metastatic potential.The other two modules were significantly enriched in biological processes such as the fusion of membrane structure,unfolded protein binding,regulation of receptor signaling pathway,which might play a role in the signal transmission and regulation processes.The mutant genes and candidate genes / proteins were significantly enriched in the pathways of bacterial invasion of epithelial cells,thyroid cancer,and citric acid cycle.The bacterial invasion of epithelial cells pathway involved the interaction of the protein with the cell surface receptor,and leaded to the change of the cell membrane and cytoskeleton,which might be related to the increased metastatic potential of HCC cell lines.In addition,we found that the gene expression pattern of metastasis potential cell lines of HCC was more similar with lung cancer tissue,which might be the reason why metastasis potential cell lines of HCC had a strong ability of lung metastasis.Whether the results of the multi-omics data integration analysis at the cellular level had the same conclusions in the human body,it needed to be validated in large population samples.Through the data mining and retrieval of TCGA and HPA large population samples,we found that among the 224 mutant genes detected in HCC cell lines,there were 191 mutant genes detected in the TCGA database of liver hepatocellular carcinoma,accounting for the total identified 224 gene number of 83%.For the 227 candidate genes / proteins that were significantly associated with the increased metastatic potential of HCC cell lines,39 genes were significantly associated with different pathologic stages of liver cancer.Among them,we found that the transcriptional expression of ZWILCH,NUF2,CENPQ,ZWINT,DLGAP5 and CD2 AP genes were positively correlated with pathologic stages,and the survival time of cancer patients with high transcriptional expression was significantly decreased.DLGAP5 and CD2 AP were also highly expressed in liver cancer tissues,indicating that the identified candidate genes DLGAP5 and CD2 AP might be closely related to the development and development of cancer,which verified the effectiveness of this technical system we had established.In summary,we have established a complete multi-omics data integration analysis technology system consisting of the original data processing and quality control analysis process and the integrating multi-omics data analysis process.Based on this technology system,we demonstrated that the multi-omics data of metastatic potential cell lines of HCC showed a directional change,and systematically screened out the candidate genes and proteins that were significantly associated with the increased metastatic potential,and explored the biological functions of those genes / proteins related to the development of tumor metastasis.In addition,we found that the gene expression pattern of metastasis potential cell lines of HCC was more similar with lung cancer tissue,which might be the reason why metastasis potential cell lines of HCC had a strong ability of lung metastasis.Through the data mining of large-scale population samples,we have not only verified our discovery in a series of metastatic cell lines of HCC,but also found that the high expression of ZWILCH,NUF2,CENPQ,ZWINT,DLGAP5 and CD2 AP promoted the development and progression of cancer.The establishment of the technical system provides a technical solution for the realization of multi-omics data integration analysis of complex diseases. |