Research On Key Techniques Of Cross-Domain Multimodal Data Analysis

Posted on:2020-05-10

Degree:Master

Type:Thesis

Country:China

Candidate:J X Du

Full Text:PDF

GTID:2428330590474456

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Cross-domain multi-modal datasets play an important role in many fields.This thesis mainly studies the data fusion problem between multi-field categorical data and multivariate time series in the background of credit.Existing researches on cross-domain multi-modal data fusion mainly focus on the fusion between text and image and between video and audio.Data fusion between multi-field categorical data and multivariate time series is hardly studied,and there are huge structural and semantic differences between the two kinds of data.How to design a specific model which can not only excavate the information contained in each modal,but also fuse the two data effectively by making up for the structural and semantic differences has become a major difficulty in this paper.This paper mainly contains the following three parts.Firstly,a new model named CFM is proposed to learn feature representation of multi-field categorical data.Based on the FM model,the Interaction Layer + K-Max Pooling Layer structure of CAT2 VEC model is introduced to CFM,which solves the problem that traditional models cannot excavate the high-order potential interactions between features.Secondly,a model named IA-RNN is proposed for feature representation learning of multivariate time series.By introducing Input-Attention to the basic LSTM unit,CFM can capture the different importance of different dimensions at each time point.Therefore,the potential information contained in the multivariate time series can be better excavated.Finally,this paper presents three different data fusion models,namely,the endto-end model based on Early Fusion,the non-end-to-end model based on Early Fusion and the end-to-end model based on Late Fusion.Through the comparison experiment,the advantages and disadvantages of the proposed models are summarized.To sum up,this paper mainly adopts the method based on deep learning.First,we design two feature representation learning models for multi-field categorical data and multivariate time series respectively,to maximize the excavated information from the two kinds of data.Then,based on the learned features,three different fusion methods are used.Experimental results show that the CFM model and the IA-RNN model are superior to benchmarks and some advanced models in the classification tasks of multi-field categorical data in CTR prediction field and the classification tasks of multivariate time series respectively.Among three proposed fusion models,the end-to-end model based on Late Fusion has the best results,and the effect of the three models on two evaluation criteria of accuracy and AUC is not less than the two integrated learning models.

Keywords/Search Tags:

multi-field categorical data, multivariate time series, data fusion, feature representation

PDF Full Text Request

Related items

1	Study On Water Quality Time Series Data Mining And Application Integration
2	Research On Representation And Clustering Methods Based On Time-series Data
3	Visualizing categorical time series data with applications to computer and communications network traces
4	Analysis Of Multivariate Time Series Under Bigdata Environmen
5	Multi-modal Time Series Data Error Discovery Algorithm Based On Hybrid Attention Mechanism
6	Research On Robust Deep Modeling Methods For Characteristics Of Time Series Data
7	Research On Feature Representation And Classification Methods In Time Series Data Mining
8	Multivariate Time Series Similarity Analysis Method And Application In Data Mining
9	Multi-feature Representation Learning And Its Application To Multimedia Data Prediction
10	Anomaly Detection Of Multivariate Time Series Data Based On Representation Learning