| ObjectiveLung cancer is one of malignant tumors, which has serious impact on human health and quality of life. Its incidence and mortality are high and gradually increasing in recent years. It has become one of the most common malignant tumors in the worldwide today.The diagnosis and treatment techniques of lung cancer have been improved rapidly in recent years. However, it has no specific clinical symptoms during the early stage of lung cancer, and there is a clinical lack of an effective method of early diagnosis for high-risk groups. When patients have typical clinical manifestations, most of them are at the advanced stage of lung cancer. Above all, the clinical treatment and prognosis of lung cancer are unsatisfactory. Therefore, an effective way will be discussed to achieve the early discovery and diagnosis of lung cancer and developed to improve the treatment and prognosis of patients with lung cancer. This will be important significance to the health and quality of life of patients with lung cancer. Detection of serum tumor markers is one of the emerging common methods in aided diagnosis of lung cancer in recent years. It has great value in diagnosis, clinical efficacy and disease progression observation. It has the merits of quantitative analysis, objectivity, less invasion, and specimen are easy to obtain and permit for repeating measurement. However, because lacking of specific serum tumor markers for lung cancer, there will be false positives and false negatives. Combined detection of a variety of tumor markers is often used to improve the positive detection rate of malignancy in early stage. Joint detection of serum tumor markers can indeed provide a lot of information, however, also a large number of parameters lead to using general statistical methods difficult to obtain right results. Decision tree is used to extract the inherent law of the data and classify new data object. The model has high sensitivity and specificity and it is suitable for clinical syndrome diagnosis. The artificial neural network (ANN) is a computational model using similar structure of the connected synapses in brain to process data and information. It can easily solve the issue with large number of parameters and provide a relatively simple and effective way to solve complex problems.Tumor marker protein biochip was used to detect the levels of nine tumor markers in serum, which includes CA199, NSE, CEA, CA242, Ferritin, AFP,CA125, HGH, and CA153. Based on these tumor markers, the new data mining techniques and the traditional classification were applied to identify effective features which can be used for rapid diagnosis of lung cancer. This study established four suitable model, such as decision tree, artificial neural networks, Fisher discriminant analysis, Logistic regression analysis. The sensitivity, specificity, model prediction accuracy, positive predictive value, negative predictive value and ROC were employed to evaluate these models in aided diagnosis of lung cancer. It may reach to rapid assisted diagnosis and lays a good foundation to improve the treatment and prognosis of lung cancer.Materials and methods1. All cases were from the Departments of Respiratory Medicine and Oncology, Fifth Affiliated Hospital of Zhengzhou University from June2010to December2011. The records of202lung cancer cases and201cases with benign lung disease were collected together from the tumor marker protein biochip detective system. All patients were confirmed histopathologically.2.Thelevels of CA199〠NSE〠CEA〠CA242〠Ferritin〠AFP〠CA125〠HGH and CA153in serum were tested by multiple tumor marker protein biochip detection kit which were produced by Huzhou Biotechnology Co.Ltd.3. A training set was randomly selected, which included75%of all lung cancer and benign lung (150cases of lung cancer,150cases of pulmonary benign), the decision tree, artificial neural network, logistic regression and fisher discriminant analysis were applied to establish the appropriate model. And then all samples were used a predictor set (202cases of lung cancer,201cases of benign pulmonary) to detect the merits of four models. The evaluation of experiment index and ROC curve was used to compare with the four model predict effect of the prediction set samples4. The SPSS12.0and Clementine12.0software were used for statistical analysis. Quantitative data were represented by median and quartile using two independent samples test of non-parametric test. Qualitative data were analyzed by the χ2test.0.05was used as the level of test.Results 1. Among the nine kinds of selected tumor markers, the serum levels of AFP, CA125, CEA, NSE, CA242, CA153, Ferritin in lung cancer group were significantly higher than those in benign pulmonary disease group (P<0.05). The expression positive rates of AFP, CA125, CA199, CEA, NSE, CA242, CA153, Ferritin in lung cancer patients were significantly higher than those in benign pulmonary diseases (P <0.05).2. The classification results of four modelsThe sensitivity, specificity, accuracy, positive prognostic value and negative prognostic value of decision tree predictions model were92.08%ã€92.54%ã€92.31%ã€92.54%and92.08%, respectively. The area of RUC was0.923.The sensitivity, specificity, accuracy, positive prognostic value and negative prognostic value of ANN predictions model were83.66%ã€88.56%ã€86.10%ã€88.02%and84.36%, respectively. The area of RUC was0.861.The sensitivity, specificity, accuracy, positive prognostic value and negative prognostic value of Logistic regression analysis were75.74%ã€86.07%ã€80.89%ã€84.53%and77.93%, respectively. The area of RUC was0.809.The sensitivity, specificity, accuracy, positive prognostic value and negative prognostic value of Fisher discrimination analysis were63.86%ã€89.05%ã€76.43%ã€85.43%and71.03%, respectively. The area of RUC was0.0.765.Conclusion1.The data mining technology combined with nine tumor markers was excellent in distinguishing lung cancer from benign pulmonary diseases.2.The diagnosis and distinguish of lung cancer by decision tree or ANN combined with nine tumor markers were better than those by Logistic regression and Fisher discriminatory analysis, and the decision tree model had the best effect among the four classification models. |