Font Size: a A A

Research On Multi-source Matrix Factorization

Posted on:2019-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:X F QueFull Text:PDF
GTID:2334330563953962Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The applications about multi-source data is various in many real-world systems.For example,in the diagnosis of Alzheimer's disease,doctors may collect patients' information from MRI images,PET images and CSF tests.Based on the data from different sources,doctors can observe patients brain structure,brain metabolism and protein level on cerebrospinal fluid.These information helps doctors diagnose patients and do corresponding treatments.Similarly,in machine learning,we can make full use of the inner relationship among multiple data source to improve models generalization ability.According to the situation of multi-source data,we used matrix factorization method and conducted the following works:Firstly,we use the non-negative matrix factorization method to alleviate the problem of data reconstruction and propose a model called regularized multi-source matrix factorization(RMSMF).Specifically,to model the correlation among data sources,RMSMF firstly uses non-negative matrix factorization to factorize the observed multi-source data into the product of subject factors and feature factors,which are the basic factors of data reconstruction.In this process,we assume different subjects from the same data source share the same feature factors.Furthermore,similarity constraints are forced on different subject factors by assuming that the subject factors are similar among all sources for the same subjects.Secondly,to reduce the noise of multi-source data and verify the effectiveness of reconstruction data,we propose to use self-paced multi-task learning method to do classification on reconstructed data.Among them,the multi-task learning utilizes the shared information among different tasks to improve the accuracy of classification.Moreover,the self-paced learning method reduces the negative influence of noise data and further enhances the performance of multi-task learning method.Thirdly,we apply the matrix factorization method to do multi-source data's clustering and thus proposed the self-paced multi-task clustering(SPMTC).Specifically,we use K-means clustering which uses matrix factorization to represent the data matrix as the inner product of matrix of different cluster centroids and indicator matrix.While the multi-task clustering method projects the multi-source data matrix into a shared subspace to achieve the purpose of information sharing.Then it learns each independent clustering task and the shared clustering task simultaneously.Moreover,it uses a balance parameters to harmonize the independent clustering models and the shared clustering model.SPMTC uses self-paced learning framework and soft-weighting strategy on multi-task clustering,which achieve the goal of noise reduction.
Keywords/Search Tags:matrix factorization, multi-source data, multi-task classification, multi-task clustering, self-paced learning
PDF Full Text Request
Related items