| Throat swab is the most common method of nucleic acid collection at present.During the collection process,medical staff have the risk of cross-infection and physical and mental health risks caused by huge workload,so the automation of throat swab sampling has become an important topic.When using robots for throat swab sampling,it is necessary to accurately and quickly segment the M-shaped region in the oral image of the sampled person.However,due to the unclear boundary between the M-shaped region and the surrounding area,and the existence of interference factors such as light,cotton swabs,mouth opening degree and oral diseases,there are great challenges in M-shaped region segmentation.Deep learning technology has significant advantages in accuracy and real-time performance compared with traditional image segmentation technology.Therefore,this paper aims to solve the problems of pharyngeal image segmentation in the actual scene,and proposes two algorithms based on Deeplabv3+ and U-Net++.Based on the improved algorithm with better effect,the depth information is used to calculate the 3D coordinates of the pixels in the sampling area.The main work is as follows:(1)A dental image dataset with various noises was produced and a preprocessing method for the dataset was proposed.Among them,the interference factors such as light and occlusion were taken into account in the image shooting process of the data set,and part of the images were from the sampled personnel suffering from tonsillitis,which improved the anti-interference ability of the model.The dataset preprocessing method included de-reflective processing,signal enhancement and data amplification.De-reflective processing of the original image can reduce the influence of reflective area on the training effect during the shooting process.grab cut algorithm was used to rough segment the image,and then it was fused with the original image to enhance the signal of the segmented region.Data augmentation was used to solve the problem of small sample size of the data set.(2)An improved network based on U-Net was proposed.VGG16 was used as the backbone feature extraction part of U-Net,and the decoder was modified to make the final feature map consistent with the size of the input image.The residual structure was added to the U-Net network to make the model more effectively combine the context information.The attention mechanism was introduced into the skip connection between each layer of the encoder and decoder to eliminate the response of irrelevant information in the skip connection and highlight salient features.The comparative experiments showed that:Compared with the benchmark network U-Net,the improved algorithm proposed in this chapter had the four evaluation indicators of precision,recall,Dice coefficient and Mean Intersection over Union(MIo U)increased by 7.12%,8.31%,7.16% and 7.50% respectively.It had good performance in the task of pharyngeal image segmentation,but the segmentation accuracy of some diseased pharyngeal images was not good.(3)An improved network based on Deeplabv3+ was proposed.Mobile Netv2 was used to replace the original backbone network for the extraction of deep features and shallow features.Due to the lightweight structure of Mibile Netv2,the parameters of the whole model were greatly reduced,thereby improving the training and detection speed of the model.The structure of Mobile Netv2 was improved,and the parallel attention mechanism and branch convolution were used to improve the segmentation accuracy of the segmentation region boundary in the image.A mixed loss function was used to alleviate the imbalance of positive and negative samples in the image.When the experiment was carried out on the custom data set,the number of parameters was greatly reduced,the time consumed by the training model is reduced,and the segmentation accuracy was further improved.The improved algorithm proposed in this chapter is 6.80%,6.04%,7.02% and6.89% higher than the improved U-Net network in four evaluation indicators.It can effectively assist the robot arm to segment and locate the pharyngeal image.(4)The depth information was used to calculate the 3D coordinates of the pixels in the sampling region in the segmentation result.The video stream collected by the optical lidar RGBD(RGB+Depth)camera was extracted,the color stream and depth information stream images of each frame were extracted,and the color map and depth map were registered.The camera intrinsic parameter matrix and extrinsic parameter matrix were calculated by camera calibration to obtain the 3D coordinates of the pixel values of the segmentation region in the camera coordinate system. |