| Purpose:Soft tissue sarcoma with complex classification is a kind of rare malignancy of mesenchymal origin.It is divided into simple karyotype and complex karyotype according to whether they have special molecular genetic changes.Compared with the latter,it is easier to find effective molecular markers that can be used for diagnosis and treatment(e.g.gastrointestinal stromal tumor).However,the diagnosis and treatment of complex karyotype soft tissue sarcomas are still difficult and challenging in clinical practice due to their various classification,high heterogeneity,strong invasiveness and lack of special genetic variation.With the development of high-throughput sequencing technology,the molecular expression pattern of complex karyotype soft tissue sarcomas has been more clearly demonstrated.Some studies have shown that certain histologic subtypes have similar expression patterns,while within the one subtype,they show different characteristics.Therefore,for seeking novel molecular markers or exploring pathogenesis,this study aims to dig internal similarity of complex karyotype of soft tissue sarcoma,identify molecular subtypes and find the molecular subtype specific molecules or pathways based on the multiple omics data(genome,transcriptome and Epigenomics).And finally,classifiers are set up to predict molecular subtypes.Methods:(1)Multiple omics data(RNAseq,miRNAseq,Methylation and CNV)and clinical characteristics of complex karyotype soft tissue sarcomas were downloaded from the TCGA database.After data preprocessing,the four omics data were integrated by similarity fusion network analysis to create an overall similarity matrix.The molecular subtypes were identified by spectral clustering analysis in the basis of matrix above and the relationships between the molecular subtypes and clinical characteristics was analyzed.(2)Based on the newly classified molecular subtypes,a variety of bioinformatics analysis methods are used to identify critical molecules or pathways specific to the subtypes.Firstly,Weighted gene co-expression network analysis(WGCNA)was used to identify the gene modules that were significantly related to the molecular subtypes.Hub genes were screened from these modules,and the influence of hub gene expression level on overall survival was analyzed using survival analysis and log-rank test.The overlap between differentially expressed genes(DEGs)of a molecular subtype and module genes were used to perform the pathway enrichment analysis.Secondly,R package estimate and CIBERSORTx tool were used to evaluate the level of immune infiltration and the composition of immune cells in tumor microenvironment,in order to find immune-related subtypes.Thirdly,QDMR software was used to identify the differential methylation regions of molecular subtypes.Combined with the tumor suppressor gene library,the subtype specific tumor suppressor genes were screened based on the relationship between the methylated level and gene expression in the differential methylation region.Finally,Gistic2 software identified recurrent CNV regions and analyzed the correlation between hub gene expression levels and CNV.(3)Two methods,Lasso and Boruta,were used to select the molecular features related to the response variables(molecular subtypes).The selected molecular feature sets were used to build regression models and support vector machine classification models to predict the molecular subtypes,and the accuracy and macro-F1 of the model was used as an index to evaluate the model performance.Results:A total of 240 cases of complex karyotype soft tissue sarcoma from five histological types were obtained,including 57 differentiated liposarcoma,24 myxofibrosarcoma,9 malignant peripheral nerve sheath tumor,101 leiomyosarcoma(74 soft tissue leiomyosarcoma and 27 uterine leiomyosarcoma)and 49 undifferentiated pleomorphic sarcoma.All the samples were divided into 5 molecular subtypes by unsupervised cluster analysis,which were labeled by C1~C5.C2 was composed of leiomyosarcomas(15 uterine leiomyosarcomas and 50 soft tissue leiomyosarcomas).C3 mainly included dedifferentiated liposarcoma(25),myxofibrosarcoma(14),and undifferentiated pleomorphic sarcoma(17).Survival analysis showed that C2 and C3 had better survival times than Cl and C4.(2)Seven gene modules(marked by color)were classified by WGCNA,among which the blue and red modules were significantly correlated with C2(r=0.9)and C3(r=0.76),respectively,and 21 blue and 5 red HUB genes were screened out.The expressions of FSCN1,PGD,ASB2,MYH11,MRVI1 and PGM5 genes in blue hub genes significantly affected the overall survival,while those in red hub genes did not.Upregulated DEGs of C2 were significantly enriched in dilated cardiomyopathy,vascular smooth muscle contraction,cGMP-PKG and other muscle contraction-related signaling pathways.Upregulated DEGs of C3 were significantly enriched in ABC transport,stem cell pluripotency regulation,human papillomavirus infection,PI3K-Akt and other signaling pathways.Compared with other subtypes,C5 exhibited the highest immune score(P<0.0001)and expression level of immune checkpoint related genes and(PD-1 and PD-L1).Meanwhile,the fractions of regulatory T cells,CD8 T cells,the resting memory CD4 T cells or monocytes in C5 were higher than others subtypes,but naive B cells plasma cells and memory B cells were lower.Sample was grouped by median immune score,the overall survival of high-immune group was better than low-immune group(P=0.026).C2 and C4 had more differential methylation regions(508 vs.179).The expression levels of 10 genes such as LOC728264,CMTM7,THY1 and CRYAB were negatively correlated with the methylation level(P<0.05).THY1 is a known tumor suppressor gene,and its corresponding methylation site(probe cg13524082)was hypermethylated in C2.Overall,the amplification of chromosomal region 12q15 was observed in 37.1%of sample,while the deletion of 13q14.2 and 17p13.1 were 72.5%and 50.8%,respectively.However,the amplification of 12q15 fragment appeared in 1.5%of C2(1/65).The expression levels of CPM,MDM2,SYNM,AKAP1,ABCA9 and CRYAB genes were correlated with the degree of copy number variation.(3)The prediction accuracy(0.842)and macro-F1(0.832)of Boruta-based SVM model was higher than Lasso regression model(0.821 and 0.793),but the former spent more time in feature selection stage(4.96 min),and the number of features used(96)was also higher than the latter(79).Conclusions:240 of complex karyotype soft tissue sarcomas were classified into five molecular subtypes.C2 were leiomyosarcomas and have a better prognosis.FSCN1,PGD,ASB2,MYH11,MRVI1 and PGM5 could be used as prognosis markers for C2.Expression of THY1 was regulated by methylation level,and may be a potential tumor suppressor gene of C2.C3 may be associated with HPV infection.C4 had more differential methylation regions,and CRYAB may be a key factor of C4,whose expression level was influenced by both methylation degree and copy number variation.C5 was an immune-related subtype with the highest level of immune infiltration,in which immunosuppressive cells(Tregs)and molecules(PD-1 and PD-L1)were highly infiltrated and expressed.Without considering the calculation or time cost,Boruta based support vector machine classifier has better performance.In conclusion,the molecular characteristics,specific prognostic markers or tumor suppressor genes of each subtype created in this research were identified,and the prediction model of the molecular subtypes was established,which could provide reference for further research. |