| In the real world,due to concerns about privacy leaks and other issues,the information of users on social platforms is isolated from each other,which can easily form data islands and make it difficult to make a complete portrait of users.Therefore,it is necessary to identify users on different platforms.In cross-social networks,there are only a few public data sets.At present,there is still no unified and complete data set for all user identification algorithms to compare the recognition results.At the same time,single-dimensional user information is sparse and falsified.This thesis designs and proposes a user identification framework for cross-social network user multidimensional information.The main contents of this thesis are as follows:Dataset construction.This thesis first analyzes the major social networking sites and finds that Foursquare can set cross-site links,that is,users can add their Facebook and Twitter account information in Foursquare.Therefore,the seed set can be constructed by crawling Foursquare users who have external links.At the same time,this thesis uses a distributed crawler framework to crawl web pages,which solves the problems of website anti-crawling mechanism and rate limit.Feature extraction.Based on the acquired data set,feature analysis and feature extraction are performed on the data.This thesis analyzes the characteristics of the three dimensions of the user’s information,namely user attributes,user friendship and user generated content.Use different feature extraction methods for features of different dimensions,use natural language processing and geographic location-related feature extraction methods for user attribute and user generated content dimensions,and use node matching method for user friend relationship dimensions.User identification and integration.Based on the extracted features,this thesis uses the stacking fusion model,BERT deep learning model,etc.to integrate and analyze the features extracted from different dimensions and identify users.In order to verify the effectiveness of the method,this thesis compares the experimental results of multi-dimensional user identification with other cross-social network user identification methods.Compared with other user identification algorithms based only on a single dimension,The model proposed in this thesis is better in recognition effect.At the same time,the ablation experiment of the model used in this thesis proves that multi-dimensional feature extraction is more robust and reliable than single-dimensional feature extraction.The accuracy rate of the cross-social network user multi-dimensional feature recognition model can reach more than 90%.Finally,based on the above-mentioned user identification model,a user analysis and identification system is designed and implemented,and the attributes of users identified as the same natural person are fused to form a complete user portrait,which provides technical support and data guarantee for downstream tasks such as recommendation systems. |