Font Size: a A A

Graph Sampling For Social Network Embedding

Posted on:2019-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:M Q MaFull Text:PDF
GTID:2370330548958925Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of hardwares’ computing capability,people are getting used to analyzing the original dataset rather than the small samples.In this scenario,larger-scale analysis algorithms for complex networks has become a norm.Among these algorithms,network embedding/representation algorithms could transform un-structural linkage data into structural vector data,making solving traditional tasks on complex networks easier by utilizing machine learning technics.However,the massive scales of complex networks have caused troubles for implementing these technics.Therefore,how to design low-cost network embedding algorithms implementations is still a critical problem.The social network on World Wide Web is ubiquitous and widely applied,and also is typical large-scale complex network.Therefore,it is of great significance to design a low-cost network embedding implementation for the social network.In this paper,we explored the ways to reduce cost of network embedding learning process through designing graph sampling methods.The contents are as follows:First,we surveyed the literature for a group of candidate subgraph features for graph sampling consideration.Then we used the decision tree model in machine learning to select critical features.After this,we designed several relatively-controllable graph sampling methods such as:spanning trees,random walk sampling with parameters,slide/anchored window sampling on reversed node degree list.By utilizing the "preferential attachment"feature in the social network,we further designed an improved slide window sampling method by directly estimate the edge count in the subgraph deduced from one slide window.Secondly,we designed the process of combining graph sampling and network embedding learning.The details include:a weighted average method for computing node vectors in the whole graph using node vectors in the subgraph;using simple link prediction as a downstream task which only needs no other than network structure information as input.Thirdly,we used this process to further inspect other features in graph sampling on random graphs and real-world networks.These features include:"best edge count",high degree nodes which left unverified during the pre-experiment,and the design that utilizing"preferential attachment" to estimate edge count in the subgraph."best edge count" is first observed during experimenting on real-world networks,and then analyzed on random graphs.Finally,we used a real-world network as an example to showcase how to use a designed graph sampling method to quickly find the desired subgraph.We explained that in a real application,it is unnecessary to get a subgraph having edge count nearly "best edge count".Sometimes,using a very small number of edges could get decent results in simple tasks as link prediction.The main contributions of this work are as follows:(1)We proposed a framework for combining graph sampling and network embedding learning.This includes graph sampling methods design and the weighted average method design.(2)We proved that a subgraph having "high edge count,high average degree" will tend to achieve good network embedding results for the whole graph.(3)We proposed "best edge count".According to the results on random graphs,the denser the graph is,the lower the ratio for "best edge count" against total edge count will be.(4)We validated on 4 real-world networks that using "preferential attachment" could accurately estimate the edge count for the subgraph.Therefore,when designing the graph sampling method for social networks or other scale-free networks,this conclusion could help with reducing the cost of reading from the graph.(5)We proposed the steps of using the improved version of slide window sampling on reversed node degree list to quickly get the desired subgraph within cost limit.The designs and conclusions from this work are not limited to social networks.Utilizing graph sampling for reducing the cost of network embedding process is general for all other networks.Having "high edge count" applies to all the network embedding algorithms which call or imply the linkage/proximity information as input,which means nearly all the state-of-art network embedding algorithms and not limited to LINE."Including more high degree nodes",and "estimating subgraph edge count using ’preferential attachment’" both could easily be adapted to other scale-free networks.
Keywords/Search Tags:social network analysis, network embedding, graph sampling
PDF Full Text Request
Related items