| The absolute measured values of gene expression were affected by many factors:(1)the high variations of measurement and batch effects;(2)the variations of proportions of tumor epithelial cell in clinical tissues samples;(3)the partial RNA degradation during specimen preparation and storage process;(4)amplification bias of low-input RNA.Thus,the “exact” quantitative information of gene expression is not reliable.In contrast,the qualitative transcriptional features,the within-sample relative expression orderings(REOs),are highly robust.The within-sample REOs contain a wealth of tumor information: REOs of gene pairs are generally stable in a particular type of normal tissues but widely disrupted in diseased tissues.Thus,we further developed the REOs-based bioinformatics methods to research the four problems in tumors.1.The REOs-based method of identifying DEGs from cross-site integrated dataBecause of the weakly differential expression signals between two phenotypes,it is difficult for traditional methods to identify repeatable differentially expressed genes(DEGs),especially when the sample size is small.Experimental batch effects hinder the direct combination of different datasets.To solve this problem,many researchers tried to combine multiple independent datasets using meta-analysis or batch effect adjustment algorithms.However,these algorithms may distort true biological differences between two phenotypes and introduce unacceptable high false rates,as demonstrated in this study.Previously,we developed Rank Comp to detect DEGs for individual disease samples by analyzing the reversal REOs in the disease samples through comparing with the highly stable normal REOs landscape.We further improved it to identify population-level DEGs through identifying significantly stable REOs in a disease group.Because the within-sample REOs of gene pairs are insensitive to experimental batch effects,the Rank Comp V2 can apply to integrated cross-site data.Utilizing nine expression datasets,we found the Rank Comp V2 method can detect more accuracy DEGs in integrated cross-site data.We combined five expression datasets of breast cancer patients receiving the neoadjuvant chemotherapy of paclitaxel,5-fluorouracil,cyclophosphamide and doxorubicin and applied the Rank Comp V2 to the estrogen receptor(ER)negative and positive integrated expression data and identified 409 and 206 DEGs between response and non-response groups,respectively.The two list DEGs separately were enriched in DNA repair and immune related pathways,suggesting a different resistance mechanism between the two subtypes2.The REOs-based method of identifying driver DEGs with absolute m RNA abundance changesMany cancer cells are aneuploid and/or ployploid.The amount of total RNA in cancer cells is more than that in normal cells.If the transcriptome sizes of two cells are different,direct comparison of the expression measurements on the same amount of total RNA for two samples can only identify genes with changes in the relative m RNA abundance,i.e.,cellular m RNA concentration,rather than genes with changes in the absolute m RNA abundance.We proved mathematically that the DEGs identified by the Rank Comp V2 algorithm,based on REOs comparison,must change in both m RNA concentration and absolute abundance.Through analyzing data for ten cancer types,we found that the DEGs with absolute m RNA abundance changes for each cancer had a significantly higher likelihood overlapping with known cancer driver genes and drug targets than the DEGs only with m RNA concentration changes exclusively identified by the SAM method.The DEGs with increased absolute m RNA abundances were enriched in DNA damage-related pathways,while DEGs with decreased absolute m RNA abundances were enriched in immune and metabolism associated pathways.3.The REOs-based method of drug resistance predictive signature coupled with drug-free prognostic signature – the case of early stage ER positive breast cancerTwo types of prognostic signatures for predicting recurrent risk of ER positive breast cancer patients have been developed: one type for patients accepting surgery only(drug-free prognostic signature)and another type for patients receiving post-operative tamoxifen therapy(drug resistance predictive signature).However,some patients would be low recurrent risk if they would have been treated with curative surgery alone,and this will confound the identification of predictive signatures of response to tamoxifen.In this study,we proposed to develop two coupled signatures to solve these problems based on within-sample REOs of gene pairs.Firstly,we identified a prognostic signature of post-operative recurrent risk using 544 samples of ER positive breast cancer patients accepting surgery only.Then,applying this drug-free signature to 840 samples of patients receiving post-operative tamoxifen therapy,we recognized 553 samples of patients who would have been at high risk of recurrence if they had accepted surgery only and used these samples to develop a tamoxifen therapy benefit predictive signature.The two coupled signatures were validated in independent data.4.The REOs-based method of distinguishing tumor subtypes – the case of reclassifying ER status of breast cancer patientsImmunohistochemistry(IHC)assessment of the ER status has low consensus among pathologists.Quantitative transcriptional signatures are highly sensitive to the measurement variation and sample quality.Here,we developed a robust qualitative signature,based on within-sample REOs of genes,to reclassify ER status.From the gene pairs with significantly stable REOs in ER+ samples and reversely stable REOs in ER-samples,concordantly identified from four datasets,we extracted a signature to determine a sample’s ER status through evaluating whether the REOs within the sample significantly match with the ER positive REOs or the ER negative REOs.A signature with 112 gene pairs was extracted.It was validated through evaluating whether the reclassified ER positive or ER negative patients could benefit from tamoxifen therapy or neoadjuvant chemotherapy.The REOs-based signature can provide an objective assessment of ER status of breast cancer patients and effectively reduce misjudgments of ER status by IHC.In this article,we developed bioinformatics analysis methods based on qualitative transcriptional features and applied them to identifying driver DEGs,analyzing integrated data and identifying tumor(breast cancer)prognostic and subtype signatures.The robust bioinformatics methods facilitate identifying tumor signatures and clinical research. |