| The arrival of the big data era has brought about the rapid change of knowledge updating.The network information provides various kinds of knowledge in different forms,which makes the knowledge objects ambiguous.How to obtain the necessary,correct and unambiguous knowledge from the vast ocean of knowledge is an urgent problem to be solved at present.Name ambiguity is a typical knowledge object ambiguity problem,which is the phenomenon that the same name corresponds to a number of real individuals.In scientific research,the author’s name ambiguity problem not only reduces the accuracy of literature and network retrieval,but also has a certain impact on data mining and other research.The aim of name disambiguation is to distinguish the real individual from the mixture.Considering the complexity of cooperative relations in scientific research papers,and in order to describe the degree of true similarity between papers more accurately,we proposes a Multi-path Walk Based on Coauthorship Association Graph(MWCAG)algorithm and a P-Sim Rank algorithm based on the bipartite graph for the name disambiguation in this paper:(1)In view of the traditional method of textual similarity calculation can not accurately measure the complexity of the relationship between coauthors,combined with the characteristics that Coauthorship Association Graph(CAG)can transfer the the link relationship between coauthors,we proposed a MWCAG algorithm for the name disambiguation.Firstly,MWCAG constructs a Coauthorship References List(CRL)based on the information of coauthors.Secondly,the CAG is constructed based on CRL.The similarity of coauthors is calculated by using the simple,effective multi-path strategy and this strategy is an optimized strategy that combines with the problem of name disambiguation.Thirdly,the similarity of venues and titles is calculated by textual similarity method.Fourthly,due to the differences in the size of the papers sets,the similarity values of different scales are different,so the dynamic hierarchical clustering is carried out to solve this problem.In this paper,we evaluated MWCAG algorithm on the real DBLP data with high data format,and the experimental results show that MWCAG can achieve both high precision and recall.(2)Analyzing the characteristics that there are indirect association between the coauthors,combining with the topological structure of coauthorship bipartite graphs,we proposes a P-Sim Rank algorithm based on the bipartite graph for the name disambiguation in this paper.Considering that the original Sim Rank algorithm can not be directly applied to the name disambiguation problem,two improvements have been made: 1)When the Sim Rank algorithm is applied to complete bipartite graphs,the similarity value between nodes with different number of neighbors is inaccurate,so the evidence factor is introduced to modify the similarity value.2)The difference of similarity is caused by the difference of the size of papers sets,so the penalty factor is introduced to balance the similarity between different sets.In this paper,we evaluated P-Sim Rank algorithm on the real DBLP data with high data format,and the experimental results show that P-Sim Rank can achieve both high precision and recall. |