Research On Author Disambiguation In Scientific Literature

Posted on:2018-10-02

Degree:Master

Type:Thesis

Country:China

Candidate:W J Zheng

Full Text:PDF

GTID:2348330515962778

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Online scientific literature knowledge bases provide convenient literature retrieval and study services recently,while author ambiguity arises and impairs accuracy of retrieval.Author disambiguation(AD)is needed to be solved on knowledge base.Existing disambiguation methods are built on clustering.For existing clustering disambiguation methods do not make full use of the authors’ relationship,We propose a method based on two-stage hierarchical clustering to solve author with different affiliation and same affiliation.Firstly,we cluster authors sharing the same name by heuristic strategy to disambiguate name.Secondly,we make full use of the global co-author relationship.We add co-author relationship in the iterative process of clustering,and combine author’s property feature to achieve disambiguation together.The main work of this paper is as follows:(1)Author disambiguation data preprocessing.Firstly,facing the literature on different literature knowledge base are not unified,a collection and extraction framework is developed to extract scientific literature data and store the data as structured data.Secondly,the author entity and the publication entity are built by using RDF triplet.The D2 R tool is used to show the relationship between author entity and publication entity.Lastly,We analyze the disambiguation abilities of author’s property features for AD.(2)By constructing the paper-coauthor relationship graph,we propose a graph-based author disambiguation model and build the disambiguation matrix.We propose a property similarity function based on word representation.We propose a co-author relationship similarity measure based on graph.We finally propose a linear combination similarity function based on the author’s property features,the co-author relationship.We use the author’s property features,co-author relationship and name ambiguity estimation,to calculate similarity between the author sharing the same name.(3)Making full use of the co-author relationship,we propose a name disambiguation method based on two-stage hierarchical clustering.In stage 1,for the problem of the sparseness of the trusted co-author relationship,we make use of the co-authors’ expansion and co-occurrence to cluster authors sharing the same name.In stage 2,for the problem of low credibility of the co-author relationship,a global calculation of the author’s relationship is proposed and is combined with the linear combination similarity function to cluster further.Experiment shows that the accuracy can be enhanced using the proposed method.

Keywords/Search Tags:

author disambiguation, hierarchical clustering, similarity calculation, RDF

PDF Full Text Request

Related items

1	The Research Of Chinese Author Name Disambiguation Based On Hierarchical Clustering
2	Design And Implementation Of Author Name Disambiguation System Based On Two Step Clustering
3	The Research On Academic Paper Author Name Disambiguation
4	Research And Implementation Of The Disambiguation Method With The Same Name In The Expert Database
5	A Study On Methods Of Author Name Disambiguation In Academic Literature
6	Graph Neural Network Based Author Name Disambiguation
7	Research On Author Name Disambiguation Algorithm Of Scientific And Technological Papers
8	Scientific Publications Author Name Disambiguation And Entity Linking
9	Research On Incremental Thesis Homonym Disambiguation Method Based On Pre Training Model And Decision Tree
10	Research On The Method Of Author-paper-identification