Font Size: a A A

Research On CT Image Segmentation Method Based On Sliding Window Attention

Posted on:2024-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z B YinFull Text:PDF
GTID:2544306941996859Subject:Network security technology and engineering
Abstract/Summary:PDF Full Text Request
CT image segmentation is very important in the field of machine vision.The U shaped network model integrated with Transformer has achieved excellent results in the field of medical segmentation.At present,these models mainly carry out self-attention calculation between fixed Windows.Although the results have been very significant,there are still some problems in the visual Transformer model for attention calculation between fixed Windows:(1)The location information between instances is unfair: The division of fixed window and the sliding of fixed window bring different experience range of different instances.(2)The size of the input picture is limited: in order to facilitate the division of fixed Windows,the picture must be cropped and fixed to a certain scale;(3)The traditional coding method causes loss of image semantic information;(4)Transformer structure lacks adaptability to the weight of channel dimension.To solve the above problems(1),(2)and(3),this paper proposes a U-shaped network model SLT-UNet based on pure visual Transformer.Based on the self-attention between Windows,this model imitates the traditional convolutional sliding mode to implement a Transforme module of sliding Windows,which is stacked into a U-shaped encoder decoder structure.The input medical image is divided into non-overlapping image blocks,and each block is put into the encoder as a mark.The feature picture is extracted and then put into the decoder,and the feature of the corresponding scale size in the encoder is fused through the residual connection,and the segmentation result is finally restored to the original size.In order to solve problem(4),this paper proposes a U-shaped network model SLTC-UNet,which is parallel with visual Transformer and convolutional neural network.Firstly,based on the fusion of sliding window attention module and void convolution module,the model is designed into a parallel sliding window local self-attention module and void convolution module by combining the different advantages of convolution and self-attention mechanism.Secondly,a new U-shaped network model is generated to segment CT images of multi-organ data sets.Then,the large scale core convolution is realized by stacking void convolution,and the depth separable convolution is used to further reduce the number of parameters.Finally,relative position deviation is added to the attention module of local sliding window,so that the model can further learn the position information of different pixels between Windows.Based on the principle of computer locality,Colossal AI is used to expand the video memory in the experiment,which improves the experimental machine’s acceptance of the model,and encapsulates the model to reduce the attention to the system’s computing resource scheduling.This paper conducts experiments on multi-organ dataset Synapse to verify the experimental effects of these two methods,and the results are better than the baseline model and the traditional fixed-window U-shaped vision Transformer model.The ablation experiment further verified the segmentation accuracy of these two models in the multi-organ segmentation task and the effectiveness of each module.After supplementary training on ACDC data set,the model in this paper still achieves excellent performance,indicating that the model in this paper has good generalization ability and robustness.
Keywords/Search Tags:Vision Transformer, CT image segmentation, UNet, Receptive field
PDF Full Text Request
Related items