| Background and ObjectiveLung cancer is a disease with high incidence and mortality worldwide,and has become a major public health problem,which is seriously threatening human health.It is shown that the 5-year survival rate of lung cancer is less than 20%and it can be improved to 70%-80%if the suitable detection and treatment are adopted in the early stage of lung cancer.Therefore,the screening of early-stage biomarkers of lung cancer is an effective means to reduce the harm of lung cancer.It is not only conducive to the early screening and treatment of lung cancer,but also can reduce the social burden associated with lung cancer.The purpose of this study was to reveal abnormally expressed molecules in the plasma and focal areas of patients with earlystage lung cancer by multi-omics technologies,and to provide candidate molecular markers for the early screening of lung cancer.Then,the candidate molecular markers are dynamically tracked by building lung cancer models in vivo and in vitro.It can provide experimental evidences for the early application of these candidate molecular markers.Finally,the candidate molecular markers in combination with traditional tumor markers such as CEA,NSE,and CYFRA21-1 were used to develop data mining models,which can be applied to realize the early warning of lung cancer in coke oven workers.Materials and Methods1.Incorporation of multi-omic technologies to investigate candidate molecular markers of early-stage lung cancerA total of five patients with early-stage lung adenocarcinoma(LUAD),five patients with early-stage lung squamous cell carcinoma(LUSC),and eight healthy controls were involved.The protein alters of peripheral blood were detected by unlabeled quantitative proteomics technology.Based on the cancer genome atlas(TCGA)database,the sequencing data of 395 patients with early-stage LUAD,406 patients with early-stage LUSC,and 43 cases of tumor-adjacent tissues were also collected,and the gene expression levels of lung tissues were obtained by transcriptome analysis.The differentially expressed proteins and genes were screened by t-test/limma analysis and those genes corresponding to the differentially expressed proteins were obtained by aligning with UniProt database.At the transcriptional level,the intersections of the differentially expressed molecules in peripheral blood and lung cancer lesions were obtained.The effect of related genes on the survival time of early-stage lung cancer patients was analyzed by Kaplan-Meier Plotter,and the candidate molecular markers of lung cancer were identified after being verified in Oncomine microarray database.2.Malignant transformation of BEAS-2B cells induced by coal tar pitch extractsGas chromatography and mass spectrometry were used to analyze the main components of coal tar pitch extracts(CTPE).BEAS-2B cells were initially exposed to 15.04μg/ml of CTPE for 5,10 and 15 times,and then cultured to passage 40.The cells after exposure were numbered passage 0,and the first passages in different groups were named as CTPE5-1,CTPE10-1 and CTPE15-1,respectively.The cell morphology was observed with an inverted microscope,and the cell malignancy was detected using plate colony formation assay and immunodeficient mouse tumorigenesis assay.The protein expression levels were determined by Western Blot.3.Tumorigenesis of C57BL/6 mice induced by coal tar pitch extractsOne hundred and eighty SPF mice were randomly divided into normal control group,vehicle control group and CTPE group.Mice in the CTPE group were intratracheally instilled with 1mg/mice CTPE for 4 times,once a week and stopped after 4 consecutive weeks.After the first intervention,the mice were dissected at 3th month,6th month,9th month,and 12th month,respectively.The tumors were observed,and the number of tumor was counted.The diameter of tumor was measured with a vernier caliper,and the tumor size was quantified.The pathological changes of the lung tissues were observed by HE staining.BCA was applied to determine protein concentration.Western Blot was employed to detect the expression of proteins in lung tissue,and ELISA was utilized to detect the expression of proteins in plasma and bronchoalveolar lavage fluid.4.Detection of molecular markers in plasma and the construction of prediction model for lung cancerA total of 185 lung cancer patients,163 healthy controls and 163 coke oven workers were recruited,and their characteristic data were collected.The expression levels of molecular markers in plasma were detected by ELISA.SPSS Clementine 12.0 software was used to divide all samples into training set and prediction set as the ratio of 3:1.Decision tree C5.0,artificial neural network(ANN),support vector machine(SVM)and Fisher discriminant model were established to identify high-risk individuals of lung cancer among coke oven workers.5.Statistical analysisMicrosoft Excel 2016 software package was applied to organize the data and to build a database.SPSS 21.0 software package was employed to describe and analyze the data statistically.Data with normal distribution were expressed as mean±standard deviation.t-test was used to compare the differences between two groups and ANOVA was utilized to compare the differences among three groups and above.The data of non-normal distribution were expressed as median and upper and lower quartile.Mann-Whitney U test was employed to compare the differences between two groups and Kruskal-Wallis H test was applied to compare the differences among three groups.The qualitative data were expressed by frequencies and corresponding rates.Chi square test was utilized to compare the difference of quanlitative data,and α=0.05 was set as the statistically significant level.Results1.Combination of multi-omic technologies to investigate candidate molecular markers of early-stage lung cancer1.1 Screening of plasmatic dysregulated proteins in patients with early-stage lung cancerBased on unlabeled quantitative proteomics technology,32 differentially expressed proteins in LUAD were determined,among which 28 proteins including SEPP1,CAT and CSR were up-regulated,and 4 proteins including APOA2,APOB,CETP and LCAT were down-regulated.19 differentially expressed proteins were identified in LUSC,including 11 upregulated proteins,mainly HBB,FGA,HYDIN,etc.There were 8 downregulated proteins which included AOC3,CLEC3B,LYVE1,etc.1.2 Screening of differentially expressed genes in patients with early-stage lung cancerBased on TCGA database,3844 dysregulated genes were identified at the early stage of LUAD,of which 2251 genes were up-regulated and 1593 genes were downregulated.5562 dysregulated genes were identified in LUSC,including 3157 upregulated genes and 2405 downregulated genes.A total of 2849 dysregulated genes were found in both LUAD and LUSC.1.3 Screening of candidate molecular markers in early-stage lung cancer by conjoint analysis of proteome and transcriptomeThe UniProt database was searched to match the genes corresponding to the differentially expressed proteins in peripheral blood.A total of 32 genes in LUAD were obtained,of which 14 genes were differentially expressed simultaneously in lung cancer lesions(including LUAD and LUSC).A total of 19 genes were obtained in the LUSC,of which 6 genes were differentially expressed simultaneously in lung cancer lesions(including LUAD and LUSC).1.4 Effects of candidate molecular markers on the survival of early-stage lung cancerBased on Kaplan-Meier online platform,the survival models were constructed targeting the 20 dysregulated genes.A total of 652 patients at stage I were recruited.The results suggested that the expression levels of CLEC3B,AOC3,HBB,CAT,SEPP1,FGA and ORM1 were related to the survival of early-stage lung cancer patients.1.5 Verification of candidate molecular markers in lung cancer based on microarray technologyOncomine database was applied to screen lung cancer microarray.The analysis results showed that the expression levels of CLEC3B,AOC3,HBB,CAT and SEPP1 were down-regulated,which were consistent with the analysis results from TCGA database.2.Construction and identification of malignant transformation model of BEAS-2B cells induced by coal tar pitch extracts2.1 Analysis of the main components of coal tar pitch extractsA total of 34 main components were identified by gas chromatography and mass spectrometry,including 15 kinds of polycyclic aromatic hydrocarbons(43.985%)and 19 kinds of heterocyclic hydrocarbons(47.216%).Polycyclic aromatic hydrocarbons mainly included three rings,four rings and five rings.The main component of tricyclic aromatic hydrocarbons was phenanthrene(1.991%).The main components of tetracyclic aromatic hydrocarbons were fluoranthene(11.357%),pyrene(10.106%),benzo[a]anthracene(3.775%),triphenylene(2.282%)and naphthacene(1.784%).The main components of pentacyclic aromatic hydrocarbons were benzo[a]pyrene(4.283%),benzo[e]pyrene(3.140%),benzo[b]fluoranthene(1.238%)and benzo[k]fluoranthene(0.843%).2.2 Confirmation of malignant transformation model of BEAS-2B cells induced by coal tar pitch extractsExposure to CTPE,the cells showed morphological changes such as irregular shape,unclear contour,vacuoles,and burrs.Compared with the control group,the number of clones was significantly increased in the CTPE5-30 and CTPE5-40 groups(P<0.05).In addition,the number of clones in the CTPE15-40 group was higher than that in the CTPE5-40 and CTPE10-40 groups(P<0.05).It was confirmed by tumor formation experiments in immunodeficiency mice that tumors could be observed in the groups of CTPE5-40,CTPE10-40 and CTPE15-40,but not in the CTPE5-30 group.Meanwhile,the weight of tumors formed by malignant transformation cells in CTPE15-40 group was higher than that in the CTPE5-40 and CTPE10-40 groups(P<0.05).2.3 Expression of candidate molecular markers in the malignant transformation of BEAS-2B cells induced by coal tar pitch extractsCompared with control group,the expression of AOC3 was up-regulated,while CAT and CLEC3B were down-regulated from the 10th passage in CTPE-5 group(P<0.05).There were no significant differences in the levels of SEPP1 at passage 10 and passage 20,but the expression of SEPP1 decreased at passage 30 and passage 40 in CTPE-exposed group(P<0.05).Compared with control group,the expression of AOC3 was increased in CTPE540 group,CTPE10-40 group and CTPE15-40 group(P<0.05),and the expression level of AOC3 in the CTPE15-40 group was higher than that in the other two groups(P<0.05).The levels of CAT,CLEC3B and SEPP1 were decreased in the CTPE5-40 group,CTPE10-40 group and CTPE15-40 group(P<0.05).The level of CAT was higher in the CTPE5-40 group than that in the CTPE10-40 and CTPE15-40 groups(P<0.05),and the level of CLEC3B in the CTPE15-40 group was decreased to compare with other two groups(P<0.05).The level of SEPP1 in the CTPE15-40 group was down-regulated to compare with that in the CTPE5-40 group(P<0.05).2.4 Expression of candidate molecular markers in BEAS-2B cells and A549 cellsCompared with BEAS-2B cells,the expression of AOC3 was up-regulated in A549 cells,and the difference was statistically significant(P<0.05).The levels of CAT and CLEC3B were down-regulated in A549 cells,and the differences were statistically significant(P<0.05).There was no significant difference in the level of SEPP1 between the BEAS-2B cells and the A549 cells(P>0.05).3.Establishment and identification of lung cancer model in C57BL/6 mice induced by coal tar pitch extracts3.1 Determination of lung cancer model induced by coal tar pitch extracts in C57BL/6 miceAfter exposure to CTPE,no tumors were observed in the CTPE group at the 3th month,while the tumors were detected at 6th,9th and 12th months with the rate of 26.67%,46.67%and 93.33%,respectively.The number of tumors at 12th month was higher than that of 9th and 6th months(P<0.05).It was shown by physical examination that LUSC,LUAD and adenosquamous mixed carcinoma were induced by CTPE,and their rates were 23.08%,53.85%,and 23.07%,respectively.3.2 Expression of candidate molecular markers during the lung tumorigenesis of CTPE-exposed miceCompared with control group,the levels of AOC3,CLEC3B,SEPP1 and HBB were down-regulated in lung tissues exposed to CTPE for 3,6,9 and 12 months(P<0.05).There were no significant differences in the levels of CAT at 3th and 6th months(P>0.05),but the expression of CAT decreased at 9th and 12th months(P>0.05).The expression of AOC3 was increased in the plasma of CTPE-exposed mice.Compared with control group,there was no difference in the level of AOC3 at 3th month in the CTPE group(P>0.05),and the levels of AOC3 were up-regulated in the plasma of CTPE-exposed mice at 6th,9th and 12th months(P<0.05).The levels of AOC3 in CTPE-exposed mice at 12th and 9th months were higher than those at 3th month(P<0.05).After exposure to CTPE,the expression of CAT initially increased and then decreased.Compared with control group,the levels of CAT were increased at the 3th and 6th months(P<0.05),but there was no difference in the expression of CAT at the 9th month(P>0.05),and it decreased at the 12th month(P<0.05).In addition,the levels of CAT at the 9th and 12th months were down-regulated to compare with mice at the 3th month(P<0.05).There were no significant differences in the levels of CLEC3B,SEPP1 and HBB at different time points(P>0.05).Compared with control group,the expression of AOC3 in the alveolar lavage fluid decreased at the 6th and 9th months(P<0.05),and increased at the 12th month after exposure to CTPE(P<0.05).The expression of CAT in the lavage fluid was increased at the 3th month of CTPE exposure to compare with control group(P<0.05),while there were no differences at 6th,9th and 12th months.Also,the expression of CLEC3B in the lavage fluid was down-regulated at 6th,9th and 12th months(P<0.05),and the expression of HBB was down-regulated at different time points after CTPE exposure(P<0.05).There were no significant differences in the levels of SEPP1 at different time points after CTPE exposure(P>0.05).4.Detection of molecular markers in human plasma and construction of prediction model for lung cancer4.1 Comparison of eight proteins in plasma of lung cancer patients,healthy controls and coke oven workersThe levels of AOC3,CEA,CYFRA21-1 and NSE were increased(P<0.05),while the levels of CLEC3B and HBB in lung cancer patients were decreased compared with healthy controls(P<0.05).There were no significant differences in the levels of CAT and SEPP1 between the two groups(P>0.05).The expression levels of CAT,SEPP1 and HBB in coke oven workers were decreased to compare with control group(P<0.05),while the expression of CYFRA21-1 in coke oven workers was higher than that in control group(P<0.05).There were no significant differences in the levels of AOC3,CLEC3B,CEA and NSE between the two groups(P>0.05).4.2 Comparison of eight proteins levels in the plasma of lung cancer patients with different stagesCompared with healthy control group,the levels of CEA and CYFRA21-1 were increased in the plasma of patients with early-stage lung cancer(P<0.05),but CLEC3B was decreased(P<0.05).There were no significant differences in the levels of AOC3,CAT,SEPP1,HBB and NSE between the two groups(P>0.05).The levels of AOC3 and CEA in the plasma with early lung cancer were lower than those in advanced lung cancer patients(P<0.05).There were no correlations between the expression levels of CAT,CLEC3B,SEPP1,HBB,CYFRA21-1,NSE and the stages of lung cancer(P>0.05).4.3 Relationship between histopathological types and the expression of eight proteins in plasmaThe levels of CAT and SEPP1 in LUSC were increased to compare with LUAD and small cell lung cancer(SCLC)(P<0.05).The expression of NSE in SCLC was significantly higher than that in LUAD and LUSC(P<0.05).There was no significant difference in the levels of CEA,AOC3,CLEC3B,HBB and CYFRA21-1 among the three pathological types(P>0.05).4.4 Evaluation and application of data mining modelsBased on five candidate molecular markers(AOC3,CAT,CLEC3B,SEPP1,HBB)of lung cancer,the ANN model was constructed with the better effect,with the AUC of 0.755,the accuracy of 76.29%,the sensitivity of 78.69%and the specificity of 72.22%.According to the traditional three markers(CEA,CYFRA21-1,NSE),the models were developed,and the decision tree C5.0 model took on the better effect,with the AUC of 0.678,the sensitivity of 60.66%,the specificity of 75.00%,and the accuracy of 65.98%.Combined five candidate molecular markers(AOC3,CAT,CLEC3B,SEPP1,HBB)with three traditional tumor markers(CEA,CYFRA21-1,NSE)of lung cancer,the decision tree C5.0 model and ANN model were constructed with the better performance,with the AUCs of 0.868 and 0.844,the accuracies of 85.57%and 82.47%,the sensitivities of 81.97%and 77.05%,and the same specificity of 91.67%.The models were applied to predict the high-risk individuals of lung cancer among coke oven workers,and 14 high-risk individuals were screened.Conclusion1.Differentially expressed molecules were initially identified by multi-omics technologies.Lung cancer models induced by coal tar pitch extracts were then set up successfully in vivo and in vitro,based on which,the changes of AOC3,CAT,CLEC3B,SEPP1 and HBB protein were proven as early molecular events through observation and verification at the different stages of lung tumorigenesis.Further,the clinical research confirmed that the expression levels of AOC3,CLEC3B and HBB in plasma were associated with lung cancer.2.Using data mining technology,the screening models of lung cancer were constructed based on the candidate molecular markers(AOC3,CAT,CLEC3B,SEPP1,HBB)and traditional tumor markers(CEA,CYFRA21-1,NSE),of which the decision tree C5.0 and artificial neural network took on the better prediction efficiency.These models might be applied for the screening of high-risk individuals among coke oven workers. |