| Cross-domain multi-modal datasets play an important role in many fields.This thesis mainly studies the data fusion problem between multi-field categorical data and multivariate time series in the background of credit.Existing researches on cross-domain multi-modal data fusion mainly focus on the fusion between text and image and between video and audio.Data fusion between multi-field categorical data and multivariate time series is hardly studied,and there are huge structural and semantic differences between the two kinds of data.How to design a specific model which can not only excavate the information contained in each modal,but also fuse the two data effectively by making up for the structural and semantic differences has become a major difficulty in this paper.This paper mainly contains the following three parts.Firstly,a new model named CFM is proposed to learn feature representation of multi-field categorical data.Based on the FM model,the Interaction Layer + K-Max Pooling Layer structure of CAT2 VEC model is introduced to CFM,which solves the problem that traditional models cannot excavate the high-order potential interactions between features.Secondly,a model named IA-RNN is proposed for feature representation learning of multivariate time series.By introducing Input-Attention to the basic LSTM unit,CFM can capture the different importance of different dimensions at each time point.Therefore,the potential information contained in the multivariate time series can be better excavated.Finally,this paper presents three different data fusion models,namely,the endto-end model based on Early Fusion,the non-end-to-end model based on Early Fusion and the end-to-end model based on Late Fusion.Through the comparison experiment,the advantages and disadvantages of the proposed models are summarized.To sum up,this paper mainly adopts the method based on deep learning.First,we design two feature representation learning models for multi-field categorical data and multivariate time series respectively,to maximize the excavated information from the two kinds of data.Then,based on the learned features,three different fusion methods are used.Experimental results show that the CFM model and the IA-RNN model are superior to benchmarks and some advanced models in the classification tasks of multi-field categorical data in CTR prediction field and the classification tasks of multivariate time series respectively.Among three proposed fusion models,the end-to-end model based on Late Fusion has the best results,and the effect of the three models on two evaluation criteria of accuracy and AUC is not less than the two integrated learning models. |