Background:Lung cancer is responsible for a large proportion of cancer-related deaths across the globe,with delayed detection being perhaps the most significant factor for its high mortality rate.Though the National Lung Screening Trial argues for screening of certain at-risk populations,the practical implementation of these screening efforts has not yet been successful and remains in high demand.On the basis of this and the background of precision medicine,radiology was born.In 2012,a Dutch researcher,Lambin P,proposed the concept of “Radiomics” for the first time and defined it as follows: The extraction of a large number of image features from radiation images with a high-throughput approach.Radiomics has attracted a large amount of attention,and the definition was updated in 2014 to the highthroughput automated(or semi-automated)extraction of large amounts of quantifiable information(or image features)from a region of interest(ROI)in radiographic images.Radiomics was designed to decode the intrinsic heterogeneity,genetic characteristics and other phenotypes of a lesion to improve management.These features are broadly classified into four categories: intensity,structure,texture/gradient,and wavelet,based on the types of image attributes they capture.Many studies have been done to show correlation between these features and the malignant potential of a nodule on a chest CT.In this paper,we quantitatively extracted and optimized radiomics features for lung cancer cases.Then we analyzed and discussed lung cancer cases by machine learning method.Methods:We obtained images of 224 patients from LIDC database and 263 patients from hospital,867 radiomics features were extracted.First,the pulmonary nodules in the chest CT were segmented;then,the region of interest of the pulmonary nodules was extracted;finally,all the radiomics features were extracted.According to the characteristics of lung nodules,the study designed 5 groups of 62 radiomics features to form the radiomics feature space of each sample.The features was used to perform dimensionality reduction based on 10 menthods.Finally,7 kinds of machine learning methods are used for classification.All analyses were performed using Matlab R2017 b.Results:The radiomics features were divided into two-dimensional radiomics and three-dimensional radiomics features.The two-dimensional radiomics features include one-dimensional radiomics features,basic shape and size features,two-dimensional grayscale run matrix(GLRLM-2D),Laws image texture features(Law-Textures),and LoG second-order edge information features.The three-dimensional radiomics features include a three-dimensional gray level co-occurrence matrix,a three-dimensional gray area size matrix(GLSZM-3D),and multi-scale three-dimensional wavelet features;these features are collectively referred to as hybrid radiomics features.We analyzed the benign and malignant classifiers of lung nodules based on random forests by two-dimensional radiomics features,three-dimensional radiomics features,and mixed radiomics features.In the three radiomics features analysis,the accuracy of the mixed features is higher than the other two radiomics features.In the classifier based on the random forest,the LIDC database showed ACC=76.26%,AUC=0.6571 and the data from the Hospital showed ACC=76%,AUC=0.866 7.In the classifier based on the support vector machine,the LIDC database showed ACC=76.37%,AUC=0.642 9,and the data from the hospital showed ACC=72%,AUC=0.773 3.The data from the LIDC database has higher classification accuracy but lower AUC,while the data from the tumor hospital is exactly the opposite.According to many experiments,the accuracy of data from LIDC is about 3% higher than the accuracy of data from tumor hospitals.So,we speculate that lung nodule data from different sources have an impact on the establishment of the classifier.Conclusion:In pulmonary nodules,radiomics can be used to identify benign and malignant nodules.Texture-based computer-aided diagnosis systems can improve the diagnostic efficacy of pulmonary nodules. |