| With the advent of the era of big data,a large number of multimedia data is flooding people's digital life.As a new and efficient information retrieval method,Cross-modal retrieval can meet the urgent needs of people for multi-modal information retrieval,which has become a research hotspot.How to mine the semantic information of multi-modal data,and make full use of implicit semantic relationship between different modalities is the emphases and difficulties of cross-modal research.At present,cross-modal retrieval research generally uses multimodal data sets marked by massive samples.However,there are a large number of unlabeled data in industry applications such as vehicle video,surveillance video,and remote sensing images.There are problems such as a small number of available samples,which due to missing modalities,low data quality and high labeling cost.Such data can be defined as a small sample of multi-modal data,which is characterized by less available data,and one modal data is far less than another modal data.It is difficult to train the model with small samples of multi-modal data,resulting in low cross-modal retrieval accuracy,which is defined as a small-sample cross-modal retrieval problem.In order to solve this problem,this thesis has conducted in-depth research on cross-modal retrieval based on deep learning and transfer learning.The main works are as follows:(1)A cross-modal task learning framework based on deep learning is proposed,and an end-to-end Cross-Modal Retrieval and Recognition Net(CMR2Net)is constructed.CMR2 Net uses similarity measurement to fuse features,analyzes the semantic relationships to realize the association of high-level features of heterogeneous data,and solves the problem of semantic calculation between different modalities.To verify the effect of CMR2Net's cross-modal retrieval,the experiment uses a sample cross-matching organization method to construct the Special Vehicles Multimode Dataset(SVMD).The experiment results of image-audio cross-modal retrieval on SVMD show that the CMR2 Net can achieve high retrieval accuracy and can effectively learn the semantic correlation between different modalities.(2)A cross-modal retrieval method for remote sensing images based on transfer learning is proposed.In order to solve the problem of cross-modal retrieval with small sample data,a Transfer Cross-Modal Retrieval and Recognition Net(TCMR2Net)is further constructed.TCMR2 Net transferred the model structure and low-level parameters of CMR2 Net.To verify the effect of TCMR2Net's cross-modal retrieval,the experiment uses the visible and near-infrared remote sensing images of GF-2 satellite to construct a Remote Sense Airplane Multimode Dataset(RSAMD).The experiment results of visible-near infrared cross-modal retrieval on RSAMD shows that TCMR2 Net can effectively transfer low-level knowledge in different modalities,and has a higher performance improvement compared with the model that does not use knowledge transfer.Deep learning and transfer learning are used to mine the potential semantic relationship of multi-modal data,and it can make cross-modal retrieval achieve high-precision in small sample datasets,which can effectively save the cost of data preprocessing.The method has certain theoretical guiding significance in solving scientific problems such as small sample cross-modal retrieval and cross-modal target recognition.Related algorithms have certain reference value for the development of application systems for special vehicles recognition of driverless cars,cross-modal target detection of remote sensing images,and intelligent information extraction of remote sensing. |