Objective:Intensity-modulated radiotherapy(IMRT)is the most definitive treatment for non-metastatic NPC.In the IMRT administration,accurate target volume delineation,especially gross tumor volume(GTV)is great of importance.However,the target volume delineation for NPC is completely done manually so far,which is a timeconsuming,laborious and error-prone work.Moreover,the interobserver heterogeneity induced by manual delineation is difficult to be avoided.These factors directly or indirectly affect the quality of IMRT planning,and may cause treatment failures in NPC therapy.At present,artificial intelligence(AI)based on deep learning has made great achievements in the field of object recognition and image segmentation.A variety of deep learning model have been proposed to segmentate GTV and organs at risk(OAR)in NPC,such as convolutional neural networks(CNN),UNet networks,etc.,and many of them have achieved state of the art performance.However,the previous automatic segmentation methods for the GTV of NPC were all based on supervised learning(SL),which meant that a large number of manually labeled data were required for model training.As we know that it is difficult to obtain a large amount of high-quality manually labeled images in clinical practice.Moreover,it is also expensive to invite experts to label these images.Therefore,these clinical requirements are likely to motivate us to build an automatic segmentation model for NPC with a small or limited amount of manually labeled images.Fortunately,semi-supervised learning(SSL)method is capable of achieving this goal.Unlike the SL method,it allows the computer to use both a small amount of manually labeled data and a large amount of unlabeled data for model development,which greatly reduces the dependence on the labeled data,and also saves the cost.In addition,the results of automatic segmentation of NPC based on deep learning models do not always fully meet the requirement of experts,and some inferior segmentation results need to be further modified.Previous studies have shown that the method of interactive segmentation based on user and machine played an important role in the further optimization of automatic segmentation results.It allowed the user to provide additional information to the incorrectly segmented area based on the initial segmentation results.Then,the model would re-output the segmentation results that satisfied the user requirement based on the user’s interactive information.In this study,we also call it the adaptive correction for the GTV of NPC.In order to address the above clinical and technical problems,(1)We plan to build a novel fully automatic segmentation model based on SSL to GTV(including GTVnx:nasopharynx gross tumor volume,and GTVnd:node gross tumor volume)segmentation of NPC,and also compare the proposed model with the existing SSL based models to verify its feasibility and superiority;(2)We plan to invite multi-center experts to evaluate and revise the model-generated GTV to further examine the model segmentation performance;(3)We plan to use the automatic segmentation model to assist junior radiotherapy oncologists in delineating GTV,and assess the improvement of GTV delineation with the model assistance,thus proving the clinical value of the proposed model;(4)We plan to design a correction module on the basis of the automatic segmentation model and perform online adaptive correction for GTV with poor quality at the initial segmentation to further improve the accuracy of GTV delineation.Materials and methods:1.Establishment of manually labeled datasets(ground truth)for GTV of NPCMRI images of patients with newly diagnosed NPC before treatment in our center from 2010 to 2016 were reviewed.With the inclusion and exclusion criteria,a total of 258 patients was included.We invited two oncologists in our center with more than ten years of radiotherapy experience in NPC to contour the GTVnx and GTVnd,and invited another senior oncologist with more than 20 years of radiotherapy experience in NPC to check all the contours and to make the final decision.After establishing the manually labeled dataset,we randomly split the 258 cases into training set(n=180),validation set(n=20)and test set(n=58)according to a certain proportion.2.Developing a semi-supervised learning network via URPC for NPC GTV segmentationWe employed 3D UNet as our framework backbones,and then proposed a semisupervised segmentation network via uncertainty rectified pyramid consistency(URPC).The proposed network was composed of a pyramid prediction network(PPNet)and an uncertainty correction module.PPNet performed image segmentation task,which could generate multi-scale image predictions.PPNet could learn from labeled data by directly minimizing the supervision loss.In addition,the PPNet was regularized by a multi-scale consistency between the pyramid predictions to deal with the unlabeled data.Through this operation,PPNet could use both labeled data and non-labeled data for learning.The uncertainty correction module was mainly designed to reduce the influence of noise on the consistency of the pyramid and improve the stability of the model during training.After establishing the network framework,we first examined the impact of different scale outputs,uncertainty rectification(UR)term,and uncertainty minimization(UM)term on the performance of URPC,namely the ablation study of the network.Then,we compared our model with other five SSL based on models on the test set.Finally,we examined data utilization efficiency of the URPC.3.Analysis of performance of URPC in different subgroup patientsFirst,we examined the variations of performance of URPC in different clinical parameters including gender,age,T stage,N stage,tumor volume,and body mass index(BMI).Second,we analyzed segmentation performance in different GTVnx slices by continuously and equidistantly extracting seven slices from the starting slice of the GTVnx.Third,we specially investigated the segmentation results of patients with N3 stage.4.Assessment of model-generated Contours by multi-center expertsWe randomly selected 20 patients with model-generated contours in the test set,and then we invited nine experts with radiotherapy experience more than ten years from our center,Sichuan Cancer Hospital and Sichuan Provincial People’s Hospital(each center had three experts)to evaluate and modify the automatic segmentation results of the GTVnx and GTVnd.Finally,we compared the model-generated contours with their corresponding contours revised by these experts to calculate the revision degree through the formula of(1-DSC)%.5.Clinical application of automatic segmentation modelWe also randomly selected 12 patients in the test set,and invited six junior oncologists from three centers(each center had two doctors)to delineate the GTV without the aid of the model,and four weeks later we asked them again to delineate the GTV with the aid of the model.Finally,these delineated contours without and with model assistance were compared with the ground truth respectively to evaluate whether the model could improve the accuracy and consistency of GTV contouring.6.Construction of GTV adaptive correction moduleIn order to efficiently encode the user’s interaction information,we proposed an adapted Geodesic distance transformation method,where we introduced a negative exponential transformation of Geodesic distance to filter out areas that were not of interest and further strengthened the interaction area.Finally,we obtained the final correction results through the max-flow/min-cut method.In this study,we named this method with Geodesic-aware Graph Cuts(GA-Graph Cuts).7.Clinical application of adaptive correction moduleWe invited an expert from our center to select 10 GTVnx and 10 GTVnd with poor segmentation result from the test set to perform online adaptive correction(Attention:the automatic segmentation results obtained were based on 10%labeled data and 90%unlabeled data).Then,we compared the revised contours with the ground truth to evaluate the performance of the proposed revision module.8.Comparison with naive 3D Graph CutsAmong the patients selected above,we invited the expert to revise the GTV using our method and the 3D Graph Cuts,respectively.When comparing the performance of the two methods,the same set of user interactions for one patient was required;when comparing the user time of the two methods,we asked the expert to stop the interaction when he was satisfied with the modified results.9.Evaluation metricsWe chose two quantitative indexes which have been widely used in the field of image segmentation:Dice similarity coefficient(DSC)and Average surface distance(ASD).The former was used to evaluate the similarity between two contours(%)and the latter was used to count the average distance between the surfaces of two contours(mm).In terms of DSC value,higher is better,however for ASD,lower is better.Results:1.Established a network framework of automatic segmentation of GTV based on semi-supervised learning via URPCBased on the backbone of 3D UNet,we have successfully constructed a network framework based on SSL via URPC for NPC GTV segmentation.Then,the ablation study was conducted on the NPC dataset with 10%labeled(18 labeled images)data and 90%unlabeled data(162 unlabeled images).The result showed that compared with the output results in other scales(S was set to 1,2,3,and 5 respectively),when the output scale of PPNet was set to 4,the segmentation performance of the model was the best:the DSC of GTVnx,and of GTVnd were 80.13%±6.37%and 75.83%±12.93%,respectively;the ASD were 1.82±1.30 mm and 2.65±2.77 mm,respectively.On the basis of S=4,after further adding UR and UM modules to the network,the segmentation performance of the network was further improved:the DSC of GTVnx,and of GTVnd were increased to 80.76%and 75.59%,respectively;the ASD were reduced to 1.69 mm and 2.20 mm,respectively.2.The proposed automatic segmentation network outperformed other semisupervised learning networksUnder the conditions of 10%labeled data and 90%unlabeled data for model training,the segmentation results of our model:the average DSC of GTVnx and GTVnd was 78.36%±7.66%,and the average ASD of GTVnx and GTVnd was 1.95±1.18 mm;MT:DSC 74.79%±9.15%,ASD 2.25±1.40 mm;ICT:DSC 76.59%±7.98%,ASD 2.15±1.38 mm;EM:DSC 74.89%±8.85%,ASD 2.40±1.54 mm;UAMT:DSC 75.78%±9.67%,ASD 2.11±1.39 mm;DAN:DSC 77.55%±7.39%,ASD 2.15±1.26 mm.The DSC of our method was the highest and the ASD was the lowest by contrast to that of the other five methods.3.URPC could efficiently use unlabeled dataThe result showed that as the labeled data increased,the mean value DSC of GTVnx and GTVnd was also increased.However,the mean value of DSC of URPC was invariably higher than that of SL and DAN when utilizing different ratios of labeled images,proving that the URPC was capable of using unlabeled data to boost performance.When the URPC using only 50%of the annotated data,the DSC(82.74%)of the URPC was comparable to that(83.51%)of SL using 100%of the annotated data.4.The performance of the model was similar in different subgroup patientsGTVnx:(1)There was no difference in DSC(male vs.female:0.83±0.05 vs.0.82±0.08,p=0.491)and ASD between genders(1.49±0.91 vs.1.47±0.62 mm,p=0.923),and there was also no difference in DSC(≤46 vs.>46:0.82±0.06 vs.0.83±0.05,p=0.489)and ASD(1.53±1.03 v.s.1.42±0.45 mm,p=0.634)between ages;(2)The DSC of T1-2 patients was 0.81±0.04,and the DSC of T3-4 patients was 0.83±0.06.The latter was slightly higher than the former but the difference did not reach statistical significance(p=0.223);however the ASD of T3-4 patients was significantly higher than that of patients with T1-2(1.64±0.97 vs.1.16±0.20 mm,p=0.036);(3)BMI did not correlate with DSC(p=0.415),nor with ASD(p=0.702);(4)There was no significant correlation between the volume of GTVnx and DSC(p=0.225);but in terms of ASD,as the volume of GTVnx increased,the ASD also increased,showing a significant positive correlation(R=0.51,p<0.001).GTVnd:(1)There was no difference in DSC(male vs.female:0.79±0.11 vs.0.81±0.10,p=0.651)and ASD between genders(2.51±3.01 vs.1.71 ± 1.56 mm,p=0.308),and there was also no difference in DSC(≤46 vs.>46:0.81±0.09 vs.0.78±0.13,p=0.264)and ASD(2.59±3.22 vs.1.85±1.72 mm,p=0.300)between ages;(2)In patients with N1-2,the DSC was 0.80±0.12,and in patients with N3,the DSC was 0.79±0.09,which was not statistically different(p=0.807);but the ASD of patients with N3 was significantly higher than that of patients with N1-2(3.86±4.04 vs.1.5 6±1.3 0 mm,p=0.002);(3)BMI did not correlate with DSC(p=0.909),nor with ASD(p=0.294).(4)There was no significant correlation between GTVnd volume and DSC(p=0.293).However,as the GTVnd volume increased,the ASD also increased,showing a weak positive correlation(R=0.26,p=0.047).At the beginning and ending slices(from head-to-foot)of GTVnx,the DSC values were 0.46±0.24 and 0.47±0.26,respectively,which were significantly lower than other intermediate slices(p<0.001).DSC values of the second,third,fourth,fifth,and sixth slices were:0.84±0.09,0.87±0.07,0.86±0.08,0.83±0.14 and 0.79±0.14,respectively.Among the 18 patients with N3 stage,4(22.22%)patients had missing segmentation for lower metastatic neck lymph nodes.5.The revision degrees of GTV by multi-center experts were low19(95.0%)patients had an average revision degree of lower than 10%.In addition,in patients with T1,T2,T3,or T4,there was no significant difference in the average revision degree of the experts:5.29%± 3.83%vs.5.22%± 3.30%vs.5.47%± 5.83%vs.3.82%±3.61%(p=0.235).2 patients(10.0%)had an average revision degree of more than 20%,2 patients(10.0%)had an average revision degree between 10-20%,and 1 patients(5.0%)had an average degree of revision between 5%-10%,and the remaining 15 cases(75.0%)had an average degree of revision lower than 5%.In addition,among patients with different N stages,N3 patients had the lowest degree of modification(N1 vs.N2 vs.N3:8.78%± 12.71%vs.7.76%± 10.13%vs.2.56%±2.99%,p=0.004).6.The accuracy of model generated contours was better than that of the junior oncologistsFor the 12 NPC patients,the average DSC of the model generated GTVnx was 0.85±0.03,while the average DSC of GTVnx delineated by R1(radiation oncologist 1),R2,R3,R4,R5 and R6 were 0.75±0.10,0.70±0.12,0.80±0.05,0.75±0.05,0.69±0.10,and 0.78±0.04,respectively.The accuracy of GTVnx generated by the model was better than that of GTVnx delineated by all junior oncologists(all p<0.05).In terms of GTVnd delineation,the average DSC of the model was 0.84±0.12,and the DSC of R1,R2,R3,R4,R5 and R6 were 0.80±0.15,0.79±0.16,0.82±0.11,0.78±0.15,0.79±0.12,and 0.74±0.11,respectively.The accuracy of GTVnd generated by the model was better than that of GTVnd delineated by R4,R5 and R6,respectively(p<0.05).Although the significant difference was not reached in R1,R2 and R3,the DSC value of the model was higher than that of those junior oncologists.7.The model assisted junior oncologists improving the accuracy of GTV delineationThe result showed that with the aid of model,the post revision degrees of GTVnx became smaller than the pro-revision degrees without it among all junior oncologists(all p<0.05).Similarly,with the aid of model,the post revision degrees of GTVnd were much smaller than the pro-revision degrees without it among the five junior oncologists(p<0.05)except R3(p=0.191).8.The model assisted junior oncologist improving the efficiency of GTV contouringThe result showed that without assistance from the model,the R1 spent on average 39.8 minutes to delineate one case.By contrast,when assisted by our model,the delineation time was substantially reduced,reaching an average value of 14.0 minutes,representing an approximately 64.8%reduction in time.Similarly,for R2,R3,R4,R5,and R6,it represented an approximately 70.1%,61.7%,78.3%,61.6%and 65.9%reduction in time,respectively.A paired t-test confirmed that the differences were statistically significant(all p<0.05).9.Established a GA-Graph Cuts for GTV adaptive correctionWith the negative exponential transformation of the Geodesic distance,we improved the traditional Geodesic distance to encode user interactive information.Compared with the previous methods including Euclidean distance,Gaussian distance and naive Geodesic distance,our method was more capable of strengthening the user’s interaction area and weakening the influence of being far away from the interaction area.10.GA-Graph Cuts boosted the accuracy of GTVIn terms of GTVnx,the initial average DSC of these 10 patients was 0.75±0.08,and the initial average ASD was 2.62±1.28 mm.After adaptive correction,the average DSC was increased to 0.79±0.05,and the average ASD was decreased to 1.76±0.60 mm.A paired t-test confirmed that the difference was close to statistical significance(p=0.054)with regard to DSC,while the difference was statistical difference in terms of ASD(p=0.040).Similarly,when it comes to GTVnd,the initial average DSC and ASD of the selected 10 patients were 0.76±0.13 and 2.82±1.89 mm,respectively.After adaptive correction,the average DSC was increased to 0.81±0.08,and the average ASD was decreased to 2.21 ± 1.72 mm.The paired t-test showed that difference of DSC was close to statistical significance(p=0.080),and the difference of ASD reached statistical difference(p=0.048).11.GA-Graph Cuts improved efficiency for GTV revisionThe result showed that when using GA-Graph Cuts to revise GTV,the average time was 260.5±77.48s,however,without the model assistance,the average time was 938.0±201.6s(p<0.001).12.GA-Graph Cuts was better than the na?ve 3D Graph CutsIt was showed that the refined DSC and ASD of GTVnx by 3D Graph Cuts were 77.65%±5.61%and 2.11 ±0.70 mm,respectively,while the refined DSC and ASD of GTVnx by GA-Graph Cuts were 79.07%±5.44%and 1.76±0.60 mm,respectively.Similarly,in terms of GTVnd,the refined DSC and ASD by 3D Graph Cuts were 77.87%±8.95%,and 2.45± 1.73 mm,respectively,while the refined DSC and ASD GA-Graph Cuts were 81.31%±7.62%and 2.21±1.72 mm,respectively.The improved DSC of GA-Graph Cuts was higher than that of 3D Graph Cuts,while the improved ASD was lower(all p<0.05).Moreover,the time required by GA-Graph Cuts was much lower than that required by 3D Graph Cuts(average time:260.5±77.48 s vs.426.6± 130.4 s,p=0.002).Conclusion:1.We have successfully constructed an automatic segmentation network for NPC GTV based on SSL through URPC.Compared with other SSL networks,our segmentation method had definite advantages in segmentation performance and robustness;2.Compared with the SL method,our SSL method only needed a few labeled data(such as 20%)and could obtain excellent segmentation performance,which significantly reduced the dependence on labeled data;3.By analyzing the segmentation results of GTV in different subgroup patients,it was suggested that the segmentation model we established had similar and state of the art performance in patients with different ages,genders,T stages,N stages,BMI or tumor volumes.4.Through the analysis of the degree of modification by multi-center experts,it was proved that the GTV delineated by our model was quite accurate,and obtained highly recognized by multi-center experts,which had promising clinical application prospect;5.The URPC model improved the accuracy and similarity of GTV contouring for junior oncologists,and also largely improved efficiency.6.We proposed a novel interactive segmentation method based on adapted Geodesic distance transformation and 3D Graph Cuts(GA-Graph Cuts)to perform adaptive correction for NPC GTV Compared with the naive 3D Graph Cuts,GA-Graph Cuts had better segmentation performance;7.GA-Graph Cuts remarkably refined the accuracy and similarity of GTV with poor results at the initial segmentation.And,compared with manual correction,GAGraph Cuts improved efficiency for GTV revision. |