The Research On Designs And Applications Of Random Walk-based Unsupervised Network Representation Learning Algorithms | | Posted on:2023-04-16 | Degree:Doctor | Type:Dissertation | | Country:China | Candidate:X Xu | Full Text:PDF | | GTID:1520307031452704 | Subject:Computer software and theory | | Abstract/Summary: | PDF Full Text Request | | Unsupervised network representation learning algorithms embed the ubiquitous unlabeled network data into low-dimensional dense vectors,which provides important support for machine learning models to solve unlabeled network analysis tasks.As an effective way to describe complex problems in the objective world,networks are widely used in real life,such as social networks,protein reaction networks,urban traffic networks,etc.With the rapid development of machine learning research,completing network analysis tasks based on machine learning models has attracted extensive attention.Embedding the information in the unlabeled network into low-dimensional dense vectors can improve the performance of machine learning models in downstream tasks and speed up the model training process.Therefore,unsupervised network representation learning has become the key to machine learning models processing unlabeled network data.The unsupervised network representation learning algorithm based on random walk plays an essential role in network-related theoretical research and practical applications because of its intuitive and effective capture of the local structure of the network and ease of parallel computing.Existing algorithms learn network representation vectors by designing complex random walk strategies.Although the unsupervised network representation learning algorithm based on random walks has been widely studied,it still faces some challenges,such as the inability to use neighbor information,difficulty in learning mixed heterogeneous information,hyperparameter dependence,and weak interpretability.In response to the above challenges,this paper carried out the following research work:(1)An unsupervised network representation learning algorithm based on the Deepwalk algorithm and neighborhood aggregation operation is proposed.The algorithm combines random walk and neighborhood aggregation operations hierarchically and uses the advantages of both to obtain network representation vectors to improve the performance of machine learning models in network analysis tasks.The algorithm embeds the local topological structure of network nodes into a set of low-dimensional vectors through deep random walks and then embeds the neighbor information of nodes into another set of lowdimensional vectors through neighborhood aggregation operations.After fusing two sets of low-dimensional vectors in different feature spaces into a unified feature space,the representation vectors of network nodes are obtained.The algorithm is compared with six popular unsupervised network representation learning algorithms for single-label node classification tasks and multi-label node classification tasks on four real data sets in different fields.Comparing the classification accuracy of this algorithm with six popular unsupervised network representation learning algorithms on four real data sets in different fields for single-label node classification and multi-label node classification,the experimental results show that the algorithm based on the Deepwalk and neighborhood aggregation outperforms other baselines.(2)An unsupervised network representation learning algorithm based on a probabilistic acceptance walk is proposed.The algorithm is based on the property that the random walk converges to a unique stationary distribution on the connected non-periodic network and designs stationary distributions according to different information in networks.Then the random walk is guided by the stationary distribution to sample the node sequence.Last using the Skip-gram model learns the representation vector from the sequence of nodes.In order to avoid designing corresponding transition probabilities for each stationary distribution,a probability acceptance walk strategy is proposed in this paper.This strategy can theoretically guarantee that any transition probability obeys the target distribution by introducing the transition acceptance rate calculated based on transition probability and stationary distribution.The principal component analysis algorithm is used to fuse the representation vectors containing different network information into a unified lowdimensional feature space to obtain the final network representations.Compared with many classical unsupervised network representation learning algorithms,the experimental results show that the unsupervised network representation learning algorithm based on probability acceptance walk is better than other baseline algorithms.(3)An unsupervised network representation learning algorithm based on the probabilistic acceptance walk and weighted neighborhood aggregation is proposed.The algorithm uses the probabilistic acceptance walk algorithm to embed the nodes in the feature space and uses the position information of the nodes in the feature space to calculate the neighborhood representation vector of the node using weighted neighborhood aggregation.In order to obtain the weights of neighbor nodes in the aggregation operation,we propose two distance-based weight calculation methods.On the node classification task,the algorithm is compared with a variety of classical unsupervised network representation learning algorithms.The experimental results show that the unsupervised network representation learning algorithm based on the probabilistic acceptance walk and weighted neighborhood aggregation obtains better classification effects than other baseline algorithms.(4)A framework for predicting medical competence based on network representation learning is proposed.The framework builds a "doctor-disease" relationship network based on electronic medical records and uses network representation learning algorithms、 word representation learning algorithms to embed topological structure information 、 text emotion information into low-dimensional vectors to jointly represent the relationship between doctors and patients.A weighted finite mixture model is used to learn the mapping relationship between doctor-disease representations and patient evaluations.The framework can predict the doctor’s performance in treating the disease based on the doctor-disease representation.The framework is verified on the data set of an Internet medical service platform.The experimental results show that the framework has a good predictive ability for doctors to treat specific diseases. | | Keywords/Search Tags: | Machine Learning, Unsupervised Network Representation Learning, Random Walk, Stationary Distribution, Complex Network | PDF Full Text Request | Related items |
| |
|