Font Size: a A A

Non-negative Matrix Factorization And Its Applications In Community Detection And Search Result Clustering

Posted on:2018-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y QinFull Text:PDF
GTID:2310330512480171Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
On the one hand,with the rapid growth of internet in the era of big data,people are overwhelmed in the ocean of data.How to get the important information from these massive data has become the main task in data mining and machine learning.Because of the 4V features of large data(Volume,Variety,Value,Velocity),how to reduce the dimensionality of these massive data has become one of the main concern issues.Non-negative matrix factorization is widely concerned because it can discover the intrinsic dimension and structure of data,and has better interpretability,and can be used directly for clustering analysis of data.On the other hand,many systems in the real world can be expressed as complex networks,and it has great significance to find a closely connected community structure(cluster analysis on node of network).In recent years,the community detection has aroused people's great interest.With the development of community detection,a lot of community detection models based on non-negative matrix factorization have been proposed.However,there are still some problems in the community detection based on non-negative matrix factorization:(1)Non-negative matrix factorization is sensitive to the initial value.In the face of the community detection problem,we need to consider the characteristics of the network structure and design an effective initial value selection strategy.(2)The effectiveness of non-negative matrix factorization of community detection have to be further improved,the existing models do not consider orthogonal constraints on base matrix factor,in order to enhance the sparse decomposition results.(3)The existing non-negative matrix factorization methods in community detection does not consider the characteristics of the network itself,when the video network consist of viewing relations,the attribute of node is short title of video,we should research on short text clustering in social media and combine attributes and link based on non-negative matrix factorization.According the above problems,the contributions of this article are as follows:(1)We proposed a new initialization method of non-negative matrix factorization(CALS).The Pagerank method is used to sort the original matrix,we consider significance and distance of node and select the K(community number)importance node as the initial basic matrix.Then,the membership matrix is solved by least square method.Experimental results on both artificial and real datasets show that CALS can not only improve the stability of the algorithm,but also improve the accuracy of the non-negative matrix factorization for community detection.(2)We proposed a novel non-negative matrix factorization(ALSOC)with orthogonal constraints,which is based on the orthogonal constraint of basic matrix.The iterative method based on the least square method shows good performance on both real and synthetic data sets.The experimental results show that the ALSOC method can not only guarantee the sparsity of the results,but also improve the accuracy of the algorithm.(3)We try to study the non-negative matrix factorization method on short text clustering,and apply the non-negative matrix factorization method on search results clustering of UGC(User Generated Content)in YOUKU.Build a video theme analysis prototype system in YOUKU.We clear up search results again and improve the diversity of search results and provide users with multi-level choice.
Keywords/Search Tags:Non-negative Matrix Factorization, Initialize, Community Detection, Search Result Cluster, Alternating Least Square
PDF Full Text Request
Related items