| Online scientific literature knowledge bases provide convenient literature retrieval and study services recently,while author ambiguity arises and impairs accuracy of retrieval.Author disambiguation(AD)is needed to be solved on knowledge base.Existing disambiguation methods are built on clustering.For existing clustering disambiguation methods do not make full use of the authors’ relationship,We propose a method based on two-stage hierarchical clustering to solve author with different affiliation and same affiliation.Firstly,we cluster authors sharing the same name by heuristic strategy to disambiguate name.Secondly,we make full use of the global co-author relationship.We add co-author relationship in the iterative process of clustering,and combine author’s property feature to achieve disambiguation together.The main work of this paper is as follows:(1)Author disambiguation data preprocessing.Firstly,facing the literature on different literature knowledge base are not unified,a collection and extraction framework is developed to extract scientific literature data and store the data as structured data.Secondly,the author entity and the publication entity are built by using RDF triplet.The D2 R tool is used to show the relationship between author entity and publication entity.Lastly,We analyze the disambiguation abilities of author’s property features for AD.(2)By constructing the paper-coauthor relationship graph,we propose a graph-based author disambiguation model and build the disambiguation matrix.We propose a property similarity function based on word representation.We propose a co-author relationship similarity measure based on graph.We finally propose a linear combination similarity function based on the author’s property features,the co-author relationship.We use the author’s property features,co-author relationship and name ambiguity estimation,to calculate similarity between the author sharing the same name.(3)Making full use of the co-author relationship,we propose a name disambiguation method based on two-stage hierarchical clustering.In stage 1,for the problem of the sparseness of the trusted co-author relationship,we make use of the co-authors’ expansion and co-occurrence to cluster authors sharing the same name.In stage 2,for the problem of low credibility of the co-author relationship,a global calculation of the author’s relationship is proposed and is combined with the linear combination similarity function to cluster further.Experiment shows that the accuracy can be enhanced using the proposed method. |