| With the development of society and the popularization of information equipment,the spread of ideas and rumors on the social networks and relationship networks formed by people has become more and more rapid.However,due to the privacy of the data and the complexity of the transmission situation,it is unknown and difficult to observe how information spread over the network.Therefore,it is of great practical significance to accurately infer or reconstruct the diffusion network structure formed by the information spread from observable transmission data.This is beneficial for relevant departments to understand the entire propagation process,and intervene in a timely manner to prevent and control their spread.In network science,the latent diffusion network structure is of great theoretical significance to reveal the dynamics process of transmission.For example,by modeling the diffusion network structure,one can learn information such as the scale of transmission,and evaluate the influence of nodes in the network.Further,the structure of diffusion network can provide research basis for many social network applications,such as the propagation source location,influence maximization,etc.To address the problem of diffusion network structure inference,this thesis focuses on the following two aspects.(1)To address the problem of slow network inference speed caused by the redundant initial candidate edges in timestamp-based network inference methods,a diffusion network inference algorithm based on clustering pruning and submodular maximization is proposed.By designing a candidate edge evaluation metric that integrates transmission timestamp and status,the algorithm reduces the impact of redundant edges and improves the speed of network inference.The proposed algorithm can be divided into two main stages,namely,the clustering pruning stage and the network inference stage.In the clustering pruning stage,firstly,a candidate edge set is constructed based on transmission timestamp.Secondly,a clustering metric integrating the information of transmission timestamp and status is designed to evaluate the existence possibility of each candidate edge and reduce the impact of redundant edges.In the network inference stage,a submodular maximization algorithm based on multiple spanning trees is utilized to perform network inference from the pruned candidate edge set.The experimental results show that,with little impact on the inference accuracy of the prototype algorithm,the running time of the proposed algorithm is reduced by 62.23% compared with that of the prototype algorithm on average.(2)To address the problem of insufficient clustering pruning in propagation status-based network inference algorithms,which increases the error of propagation probability estimation and leads to low accuracy of inference,a diffusion network inference algorithm based on clustering pruning and propagation probability estimation is proposed.Based on an improved standard mutual information,the algorithm utilizes a heuristic clustering algorithm to prune potential edges,which reduces the impact of redundant edges on propagation probability estimation.Moreover,the simulated annealing algorithm with mixed temperature is used to optimize the propagation probabilities estimated by the expectation maximization algorithm,which improves the accuracy of network inference.The proposed algorithm can be divided into two main stages,namely,the clustering pruning stage and the propagation probability estimation stage.In the clustering pruning stage.Firstly,an improved standard mutual information is designed as the clustering metric,which measure the correlation of propagation statuses between nodes.Secondly,the heuristic clustering algorithm is used to prune all potential edges to reduce the influence of redundant edges.In the estimation stage of propagation probability.Firstly,an expectation maximization estimation framework is constructed based on the propagation state data.Secondly,in order to accurately estimate the propagation probability between nodes,a mixed temperature simulated annealing method is integrated into the existing expectation maximization estimation framework to optimize the estimated results.Finally,the potential edge set and the corresponding propagation probabilities are modified by setting a filtering threshold for the propagation probability.The experimental results show that the F-Score value of the proposed algorithm increases by 41.68% on average compared with the original algorithm in terms of the inference accuracy of the propagation edge,and the mean square error value of the proposed algorithm decreases by 16.65% on average compared with the original algorithm in terms of the estimated accuracy of the propagation probability. |