| With the rapid changes in Internet and computer technology,we have now entered the era of multimedia big data.Data appears in various modal forms such as text,image and video.Different modal data present the same semantic information from different perspectives and act jointly to make the representation of things more diverse and rich.Despite the huge differences in underlying representations,cross-media retrieval is possible because of the correlation between different modal data in a high-dimensional semantic space.In this paper,we design and implement a cross-media retrieval method based on five modal data: text,image,video,audio and 3D model,focusing on the optimization issues of retrieval accuracy and retrieval efficiency.The main research contents are summarized as follows:(1)A cross-media association learning model is proposed.In order to solve the "heterogeneous gap" of cross-media data,we propose a cross-media association computation model with a joint exogenous knowledge base for multiple modal data.Firstly,we extract fine-grained features from cross-media data through long-short term memory neural networks,and design loss functions from both semantic and distributional perspectives to minimize the distance between classes of different modal data in order to maximize cross-media data correlation.Meanwhile,to address the problem of semantic sparsity prevalent in cross-media datasets,a keyword-based word-frequency-inverse document method is proposed to constrain the intra-class distance of cross-media data by introducing an exogenous knowledge base to further optimize the accuracy of association learning.The effectiveness of the proposed method is experimentally demonstrated,and better retrieval results can be achieved on the large-scale multimodal cross-media dataset XMedia Net.(2)A cross-media heterogeneous information network construction method is proposed.In order to improve the practicality of cross-media data and combine the characteristics of cross-media data itself,we propose to model cross-media data by using a heterogeneous information network.Through the heterogeneous information network,the heterogeneous cross-media data are unified in the data structure of "graph",and the network is further simplified by pruning the edges in the network to achieve a more streamlined cross-media heterogeneous information network while preserving the performance.At the same time,corresponding update strategies are defined for the network,including node deletion,insertion and update.Experiments prove that the constructed network can accurately express the correlation relationships among cross-media data,which provides a better basis for the next step of research.(3)A cross-media retrieval method based on graph embedding hashing is proposed.In order to optimize the time and storage overhead in the retrieval process of large-scale cross-media data sets on cross-media heterogeneous information networks,a graph-embedded hash-based cross-media retrieval method is proposed based on the good results achieved by hash retrieval in retrieval efficiency.Based on matrix decomposition,the cross-media heterogeneous information network is embedded into a discrete Hamming space by a hash learning approach directly used for binary hash codes,and its relationship integrity is preserved as much as possible.The experiments prove that the method can greatly improve the time efficiency of the retrieval process and reduce the usage overhead of storage while ensuring almost no loss of retrieval accuracy.To sum up,this paper starts from two aspects of the cross-media retrieval problem: "how to retrieve" and "better to retrieve",and uses the construction of a cross-media heterogeneous information network as a bridge to connect the two aspects of the problem,and achieves the retrieval optimization of cross-media data as a whole.The experimental results show that this study is worthwhile because it can improve the retrieval results and retrieval efficiency of cross-media in the case of a large and complex data set in both aspects. |