The abdomen is a crucial anatomical region in humans,housing vital organs such as the liver and ovaries that play a fundamental role in human physiology.Abdominal tumors have a high incidence and death rate,posing a severe threat to patients’ lives and health.Therefore,precise diagnosis and treatment of abdominal tumors are urgently demanded.Abdominal multi-modal medical image analysis,such as segmentation and classification of abdominal medical images,provides critical guidance and basis for the accurate diagnosis and treatment of abdominal tumors and is directly related to the quality of life after recovery.Therefore,the study of abdominal multi-modal medical image analysis methods is of great significance.With the development of convolutional neural network(CNN),the research on abdominal multi-modal medical image analysis has made great progress.However,several problems still remain,such as the difficulty in segmenting small lesions,a large number of learnable parameters in 3D convolutional neural network-based segmentation models,the tendency of deep neural network-based models to fail when applied to cross-modal images segmentation,and insufficiently discriminative feature representations of benign and malignant tumors.This thesis mainly focuses on the aforementioned four difficulties and investigates the abdominal multi-modal medical image analysis methods,including medical image-based segmentation and diagnosis.The specific contributions are:1.To solve the problem of the limited segmentation accuracy in small lesion regions of the abdominal image volume,this study proposes a cascaded framework for liver and tumor segmentation in abdominal CT volume using a high-resolution and attention-based network.Our method offers several advantages.First,the cascaded strategy used for tumor segmentation within the segmented liver area is effective in addressing the label imbalance problem that arises during tiny tumor segmentation.Second,the high-resolution backbone includes a high-resolution branch for preserving spatial details and multi-scale branches for obtaining abundant multi-scale features.Finally,the attention-based global context extraction model gets context features using a two-stage-based criss-cross method while avoiding a significant increase in the computation complexity.Extensive experimental results demonstrate the high accuracy of our method in liver and tumor segmentation.2.As the abdominal liver CT image is a 3D volume,2D CNN-based segmentation methods ignore the 3D context,while 3D CNN suffers from numerous learnable parameters and high computational cost,and hybrid2D-3D CNN has difficulty training end-to-end.To overcome these limitations,we propose a context-enhanced network.The proposed architecture consists of a 2D backbone network for obtaining 2D features from volume slices patch,a context encoding module to extract 3D context features from cross slices patch,and a dual segmentation branch including a complemental loss that guides the network to attend to both the liver region and boundary.The context-enhanced network extracts 3D context through the correlations between 2D features,which prevents a sharp increase in the number of learnable parameters by directly using 3D convolutional kernels.Besides,the proposed network can be trained end-to-end,thereby avoiding complex multi-stage network training.Experimental results indicate that our method can accurately segment the liver and achieve good performance in balancing the segmentation precision and the number of model parameters.3.CNNs usually tend to fail when applied to cross-modal images due to the different intensity distributions and contrasts in images caused by different imaging principles of different modal medical images.In this thesis,we address the domain shift in cross CT-MRI liver segmentation task and propose a zero-shot bidirectional cross-modal liver segmentation method by investigating a parameter-free latent space through the prior knowledge analysis of CT and MRI images.To our knowledge,the proposed method is the first work that explores zero-shot bidirectional cross-modal image segmentation and provides new insight into the cross-modal liver segmentation task that the domain shift can be addressed through parameter-free latent space feature mining.The proposed zero-shot method is applicable to blinded target domains,making it suitable for clinical application.Experimental results on CT and MRI images in the public and local datasets indicate that the proposed method can overcome CNN failures caused by domain shift in different modals and achieve promising cross-modal liver segmentation results.4.To address the challenge of limited discriminative feature representation in existing benign and malignant tumor diagnosis methods,this thesis proposes a dynamic fusion network for multi-modal-based ovarian tumor differentiation.The proposed network consists of dual branches and a dynamic non-linear module(D-Non L module).The dual branches effectively exploit the complementary information from T1 C and T2 WI MRI images.The D-Non L module,located at the top of the image representation,updates image features with an iterative non-linear projection parameterized by the learned features of patient-wise clinical information.The D-Non L module enables semantic interaction between clinical and image features,leading to adaptive improvement in the discrimination of visual representations.As the dynamic network adapts ovarian tumor features according to different clinical information from patients,it achieves accurate patientwise diagnosis.To verify the effectiveness of our method,an ovarian tumor dataset with multi-modal ovarian images and clinical information was created.Experimental results indicate that our method achieves more accurate patient-wise benign and malignant ovarian tumor differentiation,and our features are more discriminative than those of other multi-modal fusion methods leading to superior performance. |