Font Size: a A A

Multimodal Classification System Based On Image And User Access Time Series

Posted on:2021-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z W ZhangFull Text:PDF
GTID:2518306503973849Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Modal refers to the way things happen or the way things are experi-enced[1].Multimodality is a combination of two or more modalities in var-ious forms.When a data set or research contains multiple modalities the research is called multimodal learning.In the era of single-modal research,scholars construct the model with single-modal information provided by dif-ferent tasks.For example,in the past the research field of recommenda-tion used user scores to construct recommendation models.However,with the rise of multi-modal learning,the addition of item image and user com-ment information has made a major breakthrough in the recommendation.Therefore,multi-modal research has become increasingly popular.As a sub-problem in the study of multi-modal learning,multi-modal city func-tions classification based on images and user access sequences is to obtain a more accurate regional function classification by combining the user's access time series data with remote sensing images,which will provide a valuable reference for the governance and refinement of modern cities.This paper has done the following research work:(1)For multi-modal representation,the paper use deep SE-Res Net for feature representation extraction from large-scale remote sensing images;use Attention neural network,LSTM and GRU for feature representation from user access time series data.(2)For multi-modal fusion,the paper adopt different measures for the fusion of user access features and image features extracted by different mod-els,and use stitching method to fuse user access features extracted by Atten-tion neural network and image features extracted by deep SE-Res Net;using outer product to fuse user access features extracted by LSTM and GRU and image features extracted by deep SE-Res Net.(3)The method of Fourier approximation multi-core support vector ma-chine is used for classification prediction,and this model is compared with traditional and cutting-edge classification methods.(4)The classification performance of the model proposed in this paper is experimentally verified.Based on this model,a prototype of the urban function classification system is designed and implemented.The system mainly includes functions such as user registration and login,file upload,classification query,and historical query.This article first introduces the research background and significance of multimodal learning,analyzes the development of existing research in the field of multimodal learning,and points out the problems in the most con-cerned multimodal representation and fusion research,proposes the tech-nical route of this paper from the actual urban area function classification problem.Then the model proposed in this paper is researched and designed from the aspects of model calculation,structure and training methods,and the classification characteristics of the model are experimentally verified on the open data set.Finally,a prototype of classification model based on im-ages and user access records is designed and implemented,and the designed system is shown graphically.
Keywords/Search Tags:Multimodal learning, deep learning, Remote sensing image, Series data, Multi-Classification
PDF Full Text Request
Related items