Font Size: a A A

The Prediction Of Binary Relation In The Large-scale Social Networks

Posted on:2015-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:L ChenFull Text:PDF
GTID:2180330473953150Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The binary relation prediction problem in large-scale social network refers to predicting the edges’ binary relation by using the known structure information of the network, and it has important significance for researching the structure of network and mining of network information. Now, the best algorithm was using Logistic Regression model to predict the binary relation proposed by Jure Leskovec et al. However, the Logistic Regression model is a liner regression, it can’t fitting large-scale and complex social relation very well.This article mainly introduces three improvements of the algorithm. Firstly, though Jure Leskovec et al. picked 23 features to describe the network, but through analyzing, we found there are some linear correlation characteristics, and they are meaningless for prediction and they waste time and space,so this article uses feature selection method(PCA) to solve this problem,by using feature selection method to select these features which are more important for prediction.This article firstly proposed 10 new feature,combined Jure’s 23 features, totally 33 features,then uses PCA to reduce dimension and picked out the important features,and finally predict the binary relation;Secondly,Jure used Logistic Regression model to classify the binary relation, the Logistic Regression model is suited for the linear classification, however, the relation in social network is not linearly separable,so this article dicide to use SVM model to solve this problem,because SVM model can deal with non-linear classification problem; Thirdly, this article use the algorithm of Adaboost, Adaboost is a integration algorithm, it could integrated many weak classification to one stronger classification.These three points are apply to common social network,the big-scale social network can not model by single SVM model,so this article proposes two ways to divide the big-scale to some smaller network and model them.This article mainly introduces two ways,one way is using the edge’s EM feature,the another way is using K-means clustering algorithm.Finally the experiment showed that the prediction accuracy was improved from 84.9% to 88.37% in dataset Slashdot, and improved from 92.62% to 94.31% in dataset Epinions, and improved from 70.16% to 75.65% in dataset Wikipedia.The experiments shows that SVM model’s result is better than Logistic Regression model,and dividint big-scale social network to several smaller social network not only solving the problem of big-scale network is difficult to fitting by single SVM model, but also improving the prediction result.
Keywords/Search Tags:Social Networks, Binary Relation, Link Predicting, Machine Learning, SVM
PDF Full Text Request
Related items