Font Size: a A A

Research On Real-time Scholar Homonym Disambiguation Enhanced By Subgraph Structur

Posted on:2024-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:T Y HanFull Text:PDF
GTID:2568307130458434Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Scholar name disambiguation aims to eliminate ambiguity in the scenario where scholars share the same name,ensuring that the academic achievements of different scholars with the same name contain no errors.This problem has significant research significance in academic research and the operation of online academic literature retrieval platforms.In a large batch of newly collected literature,a scholar’s name may correspond to multiple dossiers of scholars with the same name that the platform has already organized.How to accurately assign papers with ambiguous names to corresponding scholars is a topic of immediate concern for major academic platforms.Real-time scholar name disambiguation aims to associate newly added papers with ambiguous scholar names with the correct authors among same-named candidates in real-time and accurately.In this problem,mainstream methods use feature engineering and text embedding matching to obtain semantic information between the paper to be disambiguated and same-named candidate author.However,they cannot efficiently and effectively utilize the graph structure information contained in papers and scholars.In addition,with the rapid growth of the number of researchers in the same field with same name,real-time scholar name disambiguation tasks have become increasingly complex.Using only semantics as the feature can easily lead to paper misclassification,and current research lacks exploration of the combination of semantic and structural information.To address these issues,this paper designs an end-to-end model for efficiently extracting structural features between papers to be disambiguated and candidate authors,as well as the real-time disambiguation model RND-all enhanced by subgraph structure.The research work of this paper is as follows:(1)To obtain structural information between papers to be disambiguated and candidate authors,this paper designs a Subgraph Structure Feature Extraction Model(SSF).First,the paper and candidate authors under disambiguation are constructed as ego networks separately.Then,the graph attention network is used to fully utilize the relationship between the central node and the neighbor nodes for feature aggregation,thereby improving the model’s structural feature extraction ability.Next,fine-grained similarity matrix calculations are performed between the paper graph to be disambiguated and the candidate author graphs to enhance the model’s matching ability in complex situations.Considering that the size of the similarity matrix may differ due to the different number of subgraph nodes,multiple sets of radial basis kernel functions are used to process the interaction information into fixed-dimensional graph-related feature to further enrich the information expression ability.This model is trained based on the idea of ranking learning by distinguishing between correct and incorrect authors.This paper analyzes the real-time disambiguation effect of this model by comparing different ranges of subgraph interactions and exploring the effects of multiple sets of radial basis kernel functions.The experiment shows that the structural information extracted by this method has good discriminative effect on different authors.(2)This article proposes a real-time same-name disambiguation model named RND-all(Real-time Name Disambiguation Model Integrating All The Information)that enhances semantic and structural information fusion.RND-all integrates semantic and structural information through ensemble learning to improve model accuracy and generalization.The model introduces structural information to enhance the real-time disambiguation effect.RND-all first designs feature engineering on attributes such as author name,title,keywords,institution,and conference to obtain handmade features.Then,it uses the academic pre-training model OAG-BERT to calculate the embedding matching features between the disambiguated paper and candidate authors.Finally,the model uses the graph-related features computed by the subgraph structure feature extraction model.To deal with the real disambiguation scenario,where the disambiguated paper does not belong to any existing candidate author,this article designs a two-level classifier model to increase the discriminative ability of this type of paper.The experimental results demonstrate the effectiveness of the structural features,and the combined semantic and structural features exhibit complementary effects.This model ranks first in the Who Is Who name disambiguation competition.
Keywords/Search Tags:Real-time Name Disambiguation, Graph Neural Network, Structural Information, Ensemble Learning
PDF Full Text Request
Related items