| ObjectiveLung cancer has become a serious threat to human health, and is also a public health problem, because its incidence and mortality have been increasing year by year. Data mining technology has been widely studied in the medical field, because its advantage is obvious in solving large samples and multi-parameter problems. In recent years, we have been working on the research of lung cancer diagnosis, and studied the ten biomarkers of lung cancer, including serum carcinoembryonic antigen(CEA), neuron-specific enolase(NSE), gastrin, sialic acid(SA), copper-zinc ratio(Cu/Zn)zinc ratio, serum calcium ion, DNA methyltransferase 1(DNA methyltransferase 1, DNMT1), DNA methyltransferase 3 a(DNA methyltransferase 3 a, DNMT3A), DNA methyltransferase 3B(DNMT3B DNA methyltransferase 3B) and histone deacetylation enzyme 1(histone deacetylase, HDAC1), while how to use effectively these biomarkers in the diagnosis for lung cancer has still to be further studied. The auxiliary diagnosis systems of lung cancer have been established by data mining technology combined with 10 biomarkers of lung cancer, epidemiological and clinical data. The ten biomarkers would be explored as the forecast effect of early lung cancer. The auxiliary systems of lung cancer would be reconstructed by artificial neural network(artificial neural network, ANN) model, the decision tree(decision tree, DT) C5.0 model, support vector machine(support vector rmachine, SVM) model and the Fisher discriminant analysis model. The optimal prediction model lays the foundation for the subsequent research. It could improve the tumor marker auxiliary diagnostic value for lung cancer, and achieve the goal of the auxiliary diagnosis and differential diagnosis.Materials and methods1. The objects of study: The samples, including 180 cases of lung cancer group and 243 cases of lung benign disease group were taken from the First Affiliated Hospital of Zhengzhou University, and all the samples were verified by cytology or pathological diagnosis.2. Methods: Radioimmunoassay was used to detect the serum levels of CEA, NSE and gastrin, ICP-MS for serum copper and serum zinc concentration, the improved resorcinol chromogenic method was applied for the determination of sialic acid, the fully automated biochemical analyzer was applied for the determination of serum calcium concentration, ELISA was used to determine the levels of DNMT1, DNMT3 A and DNMT3 B and HDAC1.3. Data mining: The samples were randomly divided into training set and prediction set according to the proportion of 3:1. The aided diagnosis models were developed using the ANN, C5.0, Fisher discriminant analysis, and SVM, and then verified by the prediction data, the prediction results of the four models were compared by the diagnostic test evaluation index and ROC curve. The three kinds of models were realized using Clementine 12.0.4. Statistical analysis: According to quantitative information distribution type, the statistical method and statistical testing method were chosen, chi-square test was used to compare between qualitative data set. The significant level was as 0.05. Results1. There are statistically significant(P < 0.05) between lung cancer and lung benign disease group in the levels of NSE, CEA, gastrin, DNMT1, DNMT3 A and DNMT3 B, and these tumor markers in the lung cancer group are higher than lung benign disease group.2. Combination of fever sweating, cough, phlegm, blood in phlegm and history of inflammation and nodules can effectively improve the accuracy of auxiliary diagnosis model.3. The ANN model is the best for the auxiliary diagnosis of lung cancer based on the optimal biomarkers according to the area under the ROC curve(AUC) in the high six kinds of models, but there are no statistically significant differences(P > 0.05). ConclusionANN model for the diagnosis of lung cancer is better based on the characteristics of epidemiology(gender, age, smoking history), clinical symptom(sputum and blood in the sputum, fever, sweating, history of inflammation) and tumor markers(DNMT3B, DNMT1, HDAC1, NSE gastric and CEA). It would be applied for the aided diagnosis of lung cancer in clinic. |