Breast cancer is the most prevalent malignancy among women and has a high mortality rate.With the limitations of current medical technology in effectively treating breast cancer,early detection of lesions is crucial in preventing breast cancer.In recent years,Contrast-Enhanced Spectral Mammography(CESM)has emerged as a breast cancer screening method that provides early signs of lesions,such as tumors,microcalcifications,and changes in glandular structure.CESM has high resolution and offers convenience in storage,retrieval,and transmission.Moreover,computer processing of CESM enables the analysis and identification of information beyond the visual eye,providing several advantages for ancillary breast cancer diagnosis.Numerous computer diagnostic techniques have been proposed for CESM,and some have been applied in clinical diagnosis.CESM(Contrast-Enhanced Spectral Mammography)is a type of medical grayscale image,and there are several problems in the process of computer-aided diagnosis.Firstly,CESM usually contains multiple views and modalities,and the information contained in different views and modalities is diverse.However,most of the existing methods use a single-image design model,which is prone to information loss and can lead to poor classification performance.Secondly,radiologists need to switch between different images to obtain comprehensive information about the lesion,which is time-consuming and laborious.Existing methods cannot provide a fused image to assist doctors with both diagnosis and classification.Thirdly,while most existing deep learning methods that use multimodal information to assist diagnosis extract image features and fuse multimodal features,it is unclear whether the features learned by the neural network from images of different modalities are common and relevant to classification.To address the problems mentioned above,this thesis proposes the following research:1.To effectively use image information from multiple views and modalities of CESM images,this paper proposes a Multiview Multimodal Network(MVMM-Net)that simultaneously inputs multiple views and modalities of CESM images from the same case into a neural network to extract features.The method consists of three stages: data preprocessing and input,image feature extraction,and image classification.The first stage preprocesses the CESM image,including labeling different modalities,removing the black background,and performing data enhancement.The second stage extracts features from the CESM images using a generic convolutional neural network framework.The last stage integrates different features for classification.The Res2Net50 framework achieves the best performance,with an accuracy of96.59% and 94.3% on the test sets of private and public datasets,respectively.Comparative experiments show that the classification performance of the model can be improved by using multi-view multimodal features.2.To simultaneously obtain multimodal image information of CESM in one image,this paper proposes a multimodal CESM classification method based on generative adversarial network fusion.The method consists of two parts: a generative adversarial network-based image fusion module and a Res2Net-based multi-view classification module.The fusion module generates a fused image combining Dual-energy Subtracted(DES)image and Low-energy(LE)image features,while the classification module classifies the fused image as benign or malignant.Attention mechanisms are introduced in these two modules to improve the fusion effect and classification accuracy.Extensive comparative experiments and analyses demonstrate that the proposed method can obtain good results in image fusion and achieves superior performance in comparison with the state-of-the-art CESM classification techniques.The model achieves an accuracy of 94.784%,a recall of 95.912%,a specificity of 0.945,an F1 value of 0.955,and an AUC of 0.947.3.To remove redundant information between different modalities of CESM and obtain common features related to classification,this paper proposes an unsupervised information bottleneck-based contrast-enhanced energy-spectral mammography classification model.The model incorporates information bottleneck theory to learn a common representation among different modalities for CESM classification.The framework adds unsupervised information bottleneck theory to the general Transformer classification framework to learn common feature representations across modalities,providing concise feature inputs to the classification network.This paper extends the information bottleneck theory to multi-feature representation,helping to learn correlated features between CESM images.The proposed method achieves 96.8%(private dataset)and 93.1%(public dataset)accuracy.4.To obtain cross-modal common image representations related to classification labels between modalities,this chapter incorporates the theory of multi-feature information bottlenecks to learn accurate image representations for CESM classification.The framework adds multifeature information bottlenecks to the model idea of unsupervised information bottlenecks in the previous chapter to learn cross-modal public representations associated with categorical labels.The proposed method achieves 97.18%(private dataset)and 94.52%(public dataset)accuracy,which is better than the current state-of-the-art methods. |