Font Size: a A A

The Research Of Multi-view Metric Learning For Multi-modal Data Based On DWH Model

Posted on:2019-09-02Degree:MasterType:Thesis
Country:ChinaCandidate:K YangFull Text:PDF
GTID:2428330566986588Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays,with the development of society and technology,multi-modal data is exploding rapidly,such as social media data.In the research field of machine learning tasks,sometimes,we need multi-view learning to generate multiple views from the original data sets for statistical analysis,and sometimes we need to do metric learning,because learning a good distance metric function is extremely critical in information retrieval,clustering and classification.However,the existing state-of-the-art works seldom involve multi-view metric learning,which is a combination of the two.Given the special structure of multi-modal data,we may not be able to fully mine the value of the data from a single view,and at the same time,it is inappropriate to use the traditional Euclidean distance or Mahalanobis distance in the procedure of metric learning,because there may be corresponding correlation coefficient constraints between variables,Thus,multi-view metric learning seems to be a good solution.Based on the Dual Wing Harmonium model,this dissertation combines multi-view learning and metric learning to study the multi-view metric learning method for multi-modal data.The thesis falls mainly into the following two parts.1)A multi-view metric learning algorithm called MVDML is proposed based on Dual Wing Harmonium model.This algorithm extracts a lot of different information from multi-modal data and tries to embed multiple views into a single low dimensional latent space.By minimizing the distance between similar pairs while maximizing the distance between dissimilar pairs in the process of supervised learning,we try to learn the optimal distance metric.When the modals of data are more than two,this dissertation extends Dual Wing Harmonium model to Triple Wing Harmonium model.In the meantime,the generation of pairwise constraints conditions is optimized,and Dask and Numba can be used in the data pre-processing procedure in order to realize parallel acceleration.The experiment results show that the algorithm is effective and scalable,with the running time sharply reduced via parallel acceleration in code level.2)Key information is extracted from multi-modal data by feature engineering and the original data is converted into feature vectors which are the input of the algorithm,in the hope of improving the accuracy of the model and getting better results.This dissertation mainly focuses on IMDb dataset and extract information from three modals,including user data,movie data and comment data.In order to get the parameters of the model,we use the jointly likelihood and loss by optimizing it in the latent space model,thereby our method seeks a balance between explaining the data and providing an effective distance metric,whichnaturally avoids overfitting.The experiment results show that the algorithm is optimal in classification and retrieval efficiency,and the computing time is effective compared with other mainstream models and algorithms.
Keywords/Search Tags:DWH model, Multi-view learning, Metric learning, Multi-modal data
PDF Full Text Request
Related items