| Minimally invasive surgery such as microsurgery or endoscopic surgery can improve the accuracy and safety of the operation,which is one of the development trends of modern surgery.Surgical robots can help surgeons achieve highly difficult and precise operations.Automatic analysis of surgical images can provide surgeons with rich contextual information,such as recognition of surgical phases,identifying high-risk areas of the operation,etc.,in which semantic segmentation can provide information on the category and location of surgical instruments and anatomy organs,and provide surgeons with intuition for safe surgical operations prompt.Due to the superiority of supervised deep learning performance,deep learning semantic segmentation has now become the mainstream method of surgical image segmentation.Although the number of surgical operations is huge,250 to 300 million operations are performed every year worldwide,but due to the large variety of operations and long operation time,it is very labor intensive to label different types of images of the whole operation process,and it is difficult to manually label a large number of surgical images.Traditional,fully supervised model has high segmentation accuracy,the network model contains a large number of parameters,and a large amount of label data is required for training,the lack of images of multiple surgical environments will also lead to insufficient generalization ability of the network model.The operating environment is often complex and there are many interference factors: smoke,blood,strong light,motion artifacts,etc.These factors cause the performance of the network model to deteriorate,and lead to poor model robustness.This paper proposes a semi-supervised deep learning network model based on Deep Labv3+ network architecture,which effectively avoids the above problems.The core point of this article is based on the Deep Labv3+ network architecture,based on the principle of domain adaptation,using cross-consistency principle to train and use multiple perturbation functions to transform the baseline network.The backbone of the network is composed of a main encoder and a main decoder,and multiple auxiliary decoders containing disturbance functions.During the training process,the auxiliary decoder helps to improve the performance of the network backbone.The network model is used for semantic segmentation tasks of cataract surgery images,and comparative experiments are performed to adjust the number of perturbation functions to achieve the best effect ratio of the network model.In experimental setting 1,compared with the baseline network,it has increased by 1.71%,and in experimental setting 2,it has increased by 22.34%,and the improvement effect is obvious.The experimental results show the effectiveness of implementing the consistency principle in the output layer of the network model instead of the input layer.Based on the semi-supervised deep learning theory,the proposed network model and crossconsistency training method are simple and flexible,and can be easily extended to use labels from multiple domains.The proposed network has good generalization and can be applied to multiple types of natural images and achieve better results.The proposed network strongly proves the validity of the application of cross-consistency principle in the field of medical image processing.The proposed network has strong robustness and high accuracy,comparable to the state-of-the-art segmentation accuracy in this field. |