| With the rapid development of medical informatization,various types of hospital information systems have collected a rich variety of medical data,such as radiology reports,CT images,PET images,X-ray images,and so on.Medical data becomes another special cross-modal data type after natural scene dataset.Medical cross-modal retrieval aims to use samples of one modality(e.g.,X-ray images)to retrieve samples of another modality(e.g.,radiology reports)that are semantically similar to them.The main challenges of medical cross-modality retrieval are the "semantic gap" between the underlying features and the highlevel semantics of the same modality data and the "heterogeneous gap" between the underlying features and the interrelated high-level semantics of different modalities.In this paper,we propose two medical cross-modal retrieval methods to solve the problem of "semantic gap" and "heterogeneous gap" between different modal samples by using hash learning.Specifically,the main work of this paper is as follows:1.The Deep Medical Cross-modal Attention Hashing retrieval method is proposed.To address the problem that existing deep learning methods do not perform extraction of detailed semantic information when encoding the respective global features of images and text as hash codes,a deep medical cross-modal attention hashing retrieval method(DMCAH)is proposed.Specifically,the global features of X-ray images and radiology reports are first extracted using CNN-F and word embedding,respectively.Then,we recursively move from coarsegrained to fine-grained regions of images to extract discriminative features of images in each region.Meanwhile,we recursively move from sentence level to word level to extract distinguishable semantic information of texts.Then we aggregate finer features by adaptive attention mechanism to obtain local features,respectively.Finally,to narrow the semantic gap,image and report features are mapped to a common space and differentiated hash codes are obtained.The experimental results show that on the large-scale medical dataset MIMIC-CXR,the mean average precision(m AP)score is 2.73% and 2.11% better than that of the classical deep learning methods DCMH and SSAH methods;on the natural scene dataset MS-COCO,the mean average precision(m AP)score is 2.10% and1.45% better than that of the classical deep learning methods DCMH and SSAH methods.2.The Medical Cross-modal Multiscale Fusion Category-supervised Hashing retrieval method is proposed.To address the problem that existing crossmodal retrieval methods rely on semantic similarity matrices to maintain hash codes,ignoring rich semantic information and corrupting semantic structural information,the medical cross-modal multiscale fusion category-supervised hashing retrieval method(MCMFCH)is proposed.Specifically,first the method trains the category hash network to learn the hash code of each category so that the learned hash code contains the semantic information of its corresponding category.Then the learned category hash codes are used to represent the labels as supervised information to guide the hash code learning of images,text and the joint network.Meanwhile,the joint network is used to guide the learning of hash codes for images and reports.The experimental results show that on the largescale medical dataset MIMIC-CXR,the m AP value of MCMFCH is on average6.62% better than the traditional shallow method,achieved an improvement of0.57% compared to the DCMH;on the natural scene dataset MS-COCO,the m AP value of MCMFCH is on average 12.76% better than the traditional shallow method,achieved an improvement of 1.60% compared to the DCMH. |