Font Size: a A A

Construction Of IncRNA Related Prognostic Risk Model For Breast Cancer Based On TCGA Database

Posted on:2020-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:P YeFull Text:PDF
GTID:2404330572475016Subject:Surgery
Abstract/Summary:PDF Full Text Request
Background&objective:Breast cancer is currently the most common malignancy in women worldwide.Although breast cancer treatment has made great progress in the past decade,the prognosis of patients is still worthy of attention due to the high tumor-specific death.Therefore,with the rapid development of precision medicine,high-throughput sequencing technology and genomic chip technology,the molecular biological basis for the development of breast cancer has been discovered,and it is found that breast cancer risk can be assessed,early diagnosis of breast cancer,prediction of patient prognosis and treatment of breast cancer.The molecular markers of the new targets are crucial.At present,clinical practice mainly regulates the comprehensive treatment of breast cancer according to different molecular types,and the establishment of molecular typing is based only on protein-coding genes,which is less than 2%of the human gene sequence,and the remaining non-coding sequences account for about 98%.More than 90%of non-coding sequences are transcribed,producing a large number of non-coding transcripts,non-coding RNAs(ncRNAs),long-length non-coding RNAs(long ncRNA,lncRNA)with a length of more than 200 nt.It plays a very important role in the occurrence,development.invasion,invasion,recurrence and metastasis of breast cancer.In recent years,studies have shown that a large number of genes and transcriptome changes are found during the malignant transformation of breast tissue,and these changes are often closely related to the abnormal expression of lncRNA.In this study.we integrated the TCGA database Breast Cancer(BRCA)transcriptome information to construct a prognostic risk model related to IncRNA in breast cancer,and provide evidence and reference value for prognosis risk prediction of breast cancer patients.Methods:Download the Manifest and Metadata data for the TCGA-BRCA transcriptome via the Cancer Genome Atlas(TCGA)website,then download the original HTSeq?Counts data in a cmd environment using the GDC-client download tool.Use the Perl language script to extract the expression matrix of the original data,download the Homo_sapiens.GRCh38.95.chr.gtf.gz file from the Ensembl website,compare and obtain the gene symbol based on the gene symbol.and then extract the IncRNA expression spectrum matrix using Perl language.Use the R language "edgeR" package to screen differentially expressed IncRNA.set the threshold(|log FC |>2.0 and adj.P.val<0.05):and download the clinical survival data of TCGA-BRCA from the TCG A database,using R After the language script combined the survival data and lncRNA expression data.single factor COX regression analysis was performed,then IncRNA was selected according to single factor P value for LASSO regression analysis,and then IncRNA was screened according to Lambda value for subsequent multivariate COX regression analysis.Based on multivariate regression analysis,IncRNA expression profiles and regression coefficients were selected to construct a survival-related linear risk assessment model.The risk scores of each sample were calculated according to the expression level and regression coefficient of the corresponding IncRNA of each sample,and the risk score of each sample was taken.The median was the cut-off value,and the samples were divided into high-and low-risk groups.The time-dependent ROC curve was used to evaluate the predictive ability of the prognostic model in 3-year and 5-year survival,and the C index of the prognostic model was calculated and further utilized.Kaplan-Meier survival analysis method was used to map survival curves of high and low risk groups.The R-language random function was used to randomly divide the total sample into two parts:"random group 1" and "random group 2".The two groups of samples were independent of each other,and the above statistical method was used again to calculate the risk score of each sample.According to the median of the risk score,each sub-component was divided into high and low risk groups:the sub-group was analyzed by ROC curve and Kaplan-Meier survival analysis to verify the prognosis risk model.Results:A total of 1222 samples of transcriptome counts were obtained from the TCGA database,including 113 normal samples and 1109 tumor samples.After integration,60489 gene expression profiles were obtained.and 14447 IncRNA expression profiles were extracted.After screening for differential genes.a total of 973 lncRNAs with differential expression were obtained,of which 702 were up-regulated and 271 were down-regulated.Single factor COX regression analysis was performed on differentially expressed IncRNA.31 LncRNAs were screened after P value was less than 0.05.LASSO regression analysis was performed on 31 lncRN As.15 lncRNAs were screened according to the parameter Lambda value,and 15 lncRNA gene expressions were reconstructed.Multivariate COX regression analysis with clinical data matrix,combined with single factor analysis results,12 lncRNAs(AC010542.1,AC046158.1,AC079779.3,AC093025.1.AL031598.1,ERVK-28,LINC01405.LINC01733,LINC01962,MNX1?AS2,MTUS2-AS1,SLX1A-SULT1A3)the regression coefficient is greater than zero.HR(Hazard ratio)=exp(coef)>1,negatively correlated with patient survival time,3 lncRNA(LINC01710.M APT-AS 1.The regression coefficient of TCL6)is less than zero,HR?exp(coef)<1,which is positively correlated with patient survival time.The regression coefficients of 15 lncRNA multi-factor COX analyses were extracted,a prognostic risk scoring model consisting of 15 IncRNAs was constructed,and the risk value of each sample was calculated.The sample was divided into high-risk groups and low according to the median risk value.Risk group.Using R language to map high and low risk heat maps,ROC curves and KM survival curves,the time-dependent ROC curve indicates that the risk assessment model is stable for predicting the 3-and 5-year survival prognosis of breast cancer patients(3 and 5 years survival rate)The area under the ROC curve AUC was 0.713 and 0.677,respectively.The C index of the prognostic risk model was calculated to be 0.69(95%CI:0.64-0.74),indicating that the model has good predictive ability.The K-M survival curves of the high-and low-risk groups showed that the overall survival rate was lower in the high-risk group,and the difference between the two groups was statistically significant(P=3.93E-06).The AUC of the 3-year and 5-year survival rates of the random group 1 were 0.661 and 0.633.respectively.The AUC of the 3-year and 5-year survival rates of the random group 2 were 0.766 and 0.724,respectively;the KM survival curves of each subgroup also indicated The overall survival rate was lower in the high-risk group,and the difference between the two groups was statistically significant(P=0.00506415 and P=0.00035932,respectively);indicating that the model has good stability and effectiveness.Conclusion:The prognosis risk model based on 15-lncRNA signature can predict the survival prognosis of breast cancer patients,and it has certain reference value for evaluating the prognosis of breast cancer patients.Combined with the prognostic factors of breast cancer molecular level,high-risk groups can be screened to guide the development of individualized treatment options.
Keywords/Search Tags:breast cancer, TCGA, lncRNA, prognostic risk model, COX regression
PDF Full Text Request
Related items