| Intensity modulated radiation therapy(IMRT)is one of the most widely used radiation therapy techniques.By modulating the intensity in the irradiation field,IMRT provides a highly conformal dose distribution to the target area while reducing the exposure to surrounding normal tissues,with high precision and complexity.Therefore,strict and precise quality control is required to ensure the therapeutic effect,among which,the dose verification before the implementation of radiotherapy plan is an indispensable work in the quality control content.At present,dose verification mainly uses the measurement method to obtain the actual dose distribution of the plan,and uses the gamma analysis method to compare the actual dose distribution with the calculated value of the planning system to obtain the gamma passing rate(GPR).The eligibility of the radiotherapy plan was evaluated according to the given GPR threshold.However,this method is time-consuming and requires the time of the accelerator,which brings inconvenience to clinical work.In addition,plans that do not pass dose validation need to be redesigned,potentially delaying the optimal timing of patients receiving treatment.Research has shown that the use of machine learning to predict GPR before actual dose verification could be used as a powerful auxiliary tool to help quality control staff identify some plans that may not be qualified for dose verification in advance,so as to take early measures to optimize the plan or redesign,thereby reducing the phenomenon that the patient’s treatment process is delayed.Based on this background,this paper selected 306 IMRT plans with a total of 2348 fields,and used the electronic portal imaging device(EPID)for dose verification to obtain the actual GPR of each field.At the same time,the complexity index of the planned field and the planned dosimetry evaluation index were extracted.The purpose of this paper is to use machine learning methods to establish a model,realize the prediction of GPR of IMRT,and select the algorithm with the best prediction effect through model evaluation and comparison.On this basis,methods to improve the accuracy of GPR prediction are discussed.The main research contents and conclusions are as follows:(1)In order to realize the prediction and classification of the GPR of the IMRT,artificial neural network(ANN),support vector machine(SVM),and random forest(RF)were used to establish regression and classification model,learning the relationship between plan complexity and GPR.The results show that the prediction effect of the model built by RF is higher than that of ANN and SVM,and the model built by RF algorithm could predict GPR relatively accurately.(2)In order to improve the prediction accuracy of the model,The dosimetry evaluation index was added on the basis of the complexity features,and the RF algorithm was used to build a model to realize the prediction and classification of GPR,compared with the RF model based on the complexity feature,the prediction effect of the model has been improved.In addition,all features were ranking by importance,and spearman correlation analysis was performed on features and actual GPR.The results show that the importance ranking and the correlation coefficient of the number of segments(NS)are the highest,indicating that the NS had the greatest impact on GPR.The combination of the dosimetry evaluation index and the complexity index provides an effective reference for the improvement of the performance of the GPR prediction model and the selection of features.The importance ranking of features could guide the design of the plan and provide a reference for the reasons why the dose verification fails.(3)In order to explore the method of obtaining a more effective model,the original data set containing different disease types was divided into 8 data sets according to the disease type,and the random forest algorithm was used to build the model.The results show that,compared with the models based on datasets containing different disease types,the models constructed using the datasets of breast cancer,cervical cancer,esophagus cancer,brain glioma,nasopharyngeal cancer,and rectal cancer had better or equivalent prediction performance.Therefore,it may be possible to obtain a GPR prediction model with higher prediction accuracy by separating data by disease type.In summary,this study established a machine learning model based on planning complexity to achieve GPR prediction,and the model constructed by the random forest algorithm had the best prediction effect.The addition of dosimetry evaluation indicators on the basis of planning complexity features improves the prediction accuracy of the model and provides a reference for the improvement of GPR prediction model performance and the selection of features.Constructing a model based on disease-separated data provides an idea for obtaining a machine learning model with higher prediction accuracy. |