| The incidence of lung cancer is second only to breast cancer,while the mortality ranks first among malignant tumors.PET/CT imaging is of great significance in the early diagnosis,staging,histological typing and treatment response assessment of lung cancer.The traditional SUV metrics and TNM staging cannot reflect the overall heterogeneity of tumors,resulting in limited diagnostic and predictive performance for lung cancer.By extracting high-throughput features from medical imaging,and mining meaningful biomarkers correlated with clinical outcomes using machine-learning modeling,radiomics can provide information on the biological characteristics of tumors and patient prognosis.From the clinical point of view,the diagnostic model based on radiomics analysis to identify lung cancer and active tuberculosis is missing,and the prediction model to evaluate the efficacy of EGFR TKIs therapy in patients with lung cancer has not yet been established.From the modeling point of view,the model has poor repeatability and limited generalization ability because of variability of imaging protocols introduces non-biological and system-related errors in the features,and the imaging data frequently contains class-imbalance characteristics,resulting in inaccurate prediction for data-driven machine-learning models.This thesis focuses on the precise diagnosis and treatment of lung cancer,as well as multi-center data heterogeneity and class imbalance problem,The following four aspects of work including the clinical applications of radiomics and constructions of harmonized and/or balanced models are carried out:(1)Differential diagnosis of lung cancer and pulmonary tuberculosis based on PET/CT radiomics features and semantic characteristics.97 patients with solid lung cancer and 77 patients with solid active pulmonary tuberculosis within 3 cm in diameter were collected.PET/CT radiomics signature was built by evaluating robustness,redundancy,and predictive effectiveness.Multivariable logistic regression analysis was then conducted integrating PET/CT signature with semantic features via backward stepwise selection to develop an individualized diagnostic model,displayed as a radiomics nomogram.The results demonstrated that the diagnostic performance of the nomogram performed better than the diagnosis using PET features,CT features,PET/CT features,or semantic features alone,and was comparable to the diagnosis of nuclear medicine physicians(AUC of nomogram:0.93,95%Cl:0.87-0.97;accuracy:0.85,95%CI:0.79-0.94).(2)Treatment response prediction of EGFR TKIs of patients with lung cancer based on PET/CT radiomics features and clinicopathologic characteristics.A total of 250 patients with stage ⅢC/Ⅳ EGFR-mutant non-small cell lung cancer were collected from two centers.The PET/CT radiomics model,clinical model,and a combined model of both were established to predict the probability of progression-free survival progression after EGFR TKIs therapy.The results showed that Kaplan-Meier curves present significant differences between low-and high-risk groups stratified based on PET/CT radiomics model,which can identify populations sensitive to targeted therapy.No significant differences between Kaplan-Meier curves were found when stratified based on clinical model or combined model.(3)Impact of harmonization and oversampling methods on multi-center imbalanced PET radiomics.245 patients with adenocarcinoma and 78 patients with squamous cell carcinoma from 4 centers were collected.We trained,validated,and externally tested different machine-learning classifiers to investigate the effect of 4 harmonization and 5 oversampling methods on PET radiomics-based subtype prediction.We found that applying harmonization and oversampling had a positive effect on predictive performance but varied across classifiers.The combination of ComBat and SMOTE via random forest classifier had the best performance with the validation AUC increased from 0.608 to 0.725,and the G-mean increased from 0.398 to 0.625 relative to the baseline(no harmonization and no oversampling).To this end,we have created an open-source data preprocessing system for improving classifier performance in different clinical tasks via systematic comparisons of harmonization and oversampling techniques.(4)Deep learning-based harmonization of CT reconstruction kernels towards improved clinical task generalization performance.Deep learning-based image conversion method was proposed to reduce the impact of CT reconstruction kernels on radiomics analysis.The proposed method improved the reproducibility of radiomics features and generalization performance of the radiomics models between B30f and B70f kernels for predicting lymph node metastasis in patients with lung adenocarcinoma as well as differentiating lung cancer from pulmonary tuberculosis.Furthermore,we validated that the proposed method can be generalized to harmonize the majority(17 of 20)kernels from different scanners of different vendors via phantom evaluations,facilitating the comparability of radiomics features extracted from images reconstructed with various kernels.In conclusion,PET/CT radiomics features can comprehensively describe tumor heterogeneity and have shown great potential for personalized and precise diagnosis and treatment of lung cancer.Applying data preprocessing techniques such as harmonization and/or oversampling can help improve the generalization performance of radiomics models,promote the multi-center radiomics research,and accelerate the clinical translation of radiomics. |