Brain tumor is a common disease of the brain.Among them,glioma is one of the most aggressive malignant brain tumors.This kind of tumor can cause the increase of intracranial blood pressure and eventually endanger patients’ life.Therefore,early diagnosis and treatment based on medical image analysis is of great importance.Among various medical imaging technologies,magnetic resonance imaging(MRI)as one of the commonly used medical diagnostic tools is used for internal imaging of human body.MRI is non-radioactive and can display soft tissue and various anatomical structures in multiple sequences and directions.However,it is timeconsuming and labor-consuming to manually combine and complete fine segmentation by doctors.Therefore,how to segment brain tumors effectively and automatically has become an important research topic.Image analysis based on deep learning algorithm is the hottest research direction.By developing convolutional neural network(CNN)or Transformer,visual tasks such as image semantic segmentation can be completed end-to-end directly.However,traditional deep learning algorithms still have some problems in processing brain tumor MR images:(1)Traditional CNN lacks the understanding of prior information of brain tumors,and ignores the influence of semantic gap on feature fusion;(2)CNN and Transformer will face high parameters and computational complexity when processing 3D volume images,which increases the difficulty of model training;(3)The generalization of classical model architectures is poor,and the network framework still needs to be optimized and explored according to specific image tasks.To solve the above problems,from the perspective of image semantic segmentation algorithm,this paper explores and applies the segmentation method based on deep learning,and proposes three medical image segmentation models based on CNN and Transformer.(1)There is an imbalance in tumor size and category in sliced samples,which directly affects the final segmentation similarity of the model.At the same time,the Ushaped network ignores the compatibility of features when making skip connections,which easily leads to semantic confusion in the model.Aiming at this problem,this paper proposes a 2D multi-scale mesh aggregation feature extraction CNN(MSMANet).First,the modified Res-Inception module is introduced into the encoder instead of standard convolution to extract and aggregate effective features from receptive fields with different size.Second,a novel mesh aggregation strategy is proposed to gradually refine the visual features,and then alleviate the semantic gap between encoder and decoder.At the same time,the mesh aggregation strategy maximizes the aggregation of multi-level features at different scales and realizes the complementary advantages among features.Finally,by appropriately arranging the attention mechanism and deep supervision strategy,the recognition and convergence ability of the network are effectively improved.(2)Aiming at the problem that 2D CNN cannot make full use of the spatial context information of 3D images and the excessive computation of traditional 3D CNN,a 3D ultra-lightweight Ghost spatial pyramid convolution neural network(GSPNet)is proposed in this paper.First,the Ghost module is used to replace the standard convolution to reduce the computation of the network.Second,we propose a lightweight Ghost spatial pyramid convolution module based on the Ghost module.The module learns the features under different receptive fields at low computational cost in the encoding path,so as to improve the representation and multi-scale feature processing ability of the network.Finally,we propose residual Ghost module as decoder to refine semantic information and avoid network degradation.(3)Aiming at the problem that CNN cannot model long-range independence,this paper continues to improve GSPNet and proposes a lightweight model(GMETANet)combining self-attention and Meta Former.GMeta Net contains a shared encoding path and two decoding paths.The encoding path uses the proposed GSP and Ghost selfattention(GSA)to extract local and global context features,while the two decoding paths continue to model local and global context information respectively.The local decoder of GMeta Net is the same as that of GSPNet,while the global decoder introduces an improved Meta Former architecture to deal with the high-level semantic features containing global information.At the same time,we aggregate the feature maps in the local decoding path to the global decoding path,and use the local features with different distribution to enhance the representation of the global decoding path,while reducing the computation of re-modeling.The local-global decoder is added with an auxiliary loss at the end,and the main loss function is applied after the feature maps that aggregates the local and global context. |