Font Size: a A A

Career Research Of Scholars Based On Big Data Of Science And Technology

Posted on:2022-10-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z ShaoFull Text:PDF
GTID:1528307061972839Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Science and technology big data(STBD)generally refers to large-scale data related to science and technology.This kind of data is scientific,systematic,rigorous,and inheritable,containing specific scientific and technological development laws.STBD implies the historical process of scientific development as well as the trend and law of technological evolution and contains the history and tendency of scholars’ career mobility.The evolution and trend of scholar groups and individuals on time,space,research interest,and cooperative relationship can be well explored by STBD-based scholars’ career mining.However,this field is still emerging with the advent of the era of big data and artificial intelligence,which is still in the primary research stage with many problems not yet explored.Besides,some key technical problems have not been solved,such as research institutions alignment,generation of scholars’ career trajectories,scholars’ career modeling and prediction,subject areas exploration by scholars’ career,scholars’ mobility inspection in specific areas,etc.This dissertation has carried out a series of researches and experiments to solve some critical technical problems and explore some key issues in this field,and the main research results include:(1)To solve the problem of existing institution naming disambiguation methods that can not deal with the increasingly growing and changing data in STBD,we propose a knowledge graph-based institution naming disambiguation framework ELAD,which has the advantages of reducing manual intervention,establishing inter-entity linkages and effectively entities description,learning the knowledge of knowledge graph,etc.The framework includes two sub-modules: Candidate Generation Algorithm and Result Selection Algorithm.The Candidate Generation Algorithm maps affiliation strings in papers to entities in the knowledge graph by mainly using a language model,and then it generates a collection of all possible candidate institutional entities.The Results Selection Algorithm maps the possibility of each candidate in the candidate set to a probability space by introducing algorithms such as LCS(Longest Common Subsequence)and MED(Minimum Edit Distance),and selects the most likely result by using the principle of information entropy to obtain the most likely institution entity.In the real data scenario,the experimental results show that the proposed framework is superior to traditional methods in all performance indicators,and the request for the knowledge graph will not increase linearly with the increase of data volume,which can be applied in industrial scenarios.(2)To solve the problem of existing scholars’ career trajectories generation approaches can not deal with mapping the geographical location information of scholars from their paper,interference of error data,and low accuracy of the generated scholars’ trajectories,we proposed a framework named ATraj RN based on the redundancy and noise data of scholars’ research outcomes for generating their career trajectories.This framework can avoid integrating heterogeneous data and effectively mine scholars’ careers from the data of their scientific research outcomes.This framework proposes three key technologies: 1)an algorithm for positioning based on academic achievements of scholars(PAAS algorithm);2)a workplace distribution probability calculation method based on statistical feature-based deep learning methods;and 3)the Trajectories Generation algorithm.ATraj RN can make full use of the redundant papers of senior scholars and their complex cooperative relationship network to overcome the propagation of errors caused by disambiguation errors of scholars as well as institutions and predict the location of scholars in the years without scientific research outcomes.Experiments show that ATraj RN has higher accuracy as well as lower time and space complexity in generating the scholars’ trajectories of senior scholars.The application of adaptive transfer learning in detecting scholars’ job-hopping abnormal behavior can quickly help us identify scholars with naming disambiguation errors,which ensures the accuracy of the generated scholars’ trajectories.(3)To solve the problem of predicting the job-hopping of scholars,an attention-based Graph Neural Network model–SJHPre is proposed to predict the job-hopping behavior of scholars from dynamic data.SJHPre can model scholars’ historical work experience sequence and scholar-scholar collaboration network in complex STBD,avoiding the connection of a large number of heterogeneous data and solving the problems of feature vectorization and model embedding.SJHPre introduces an attention mechanism to generate the features of attention-aware and a graph neural network to integrate scholars’ information as well as their changing academic and social network.For the characteristics of the model,it can combine scholar information with academic social network information and learn from the scholar’s own working experience and the latent expression existing in the collaboration network,which helps improve the accuracy of the model.This paper conducts extensive experiments on two real-world,large-scale,open data sets,which shows that the proposed model significantly outperforms competing techniques and achieves the state-of-the-art result.Further experiments explore the influence of different scholar groups,model parameters,etc.,on the prediction of job-hopping behavior.(4)To solve the difficulty of in-depth mining of career-related information for scholars in the field of artificial intelligence,we propose a career analysis methodology for scholars in specific fields based on STBD.Based on this methodology,we firstly construct field academy knowledge graphs.And then,we design a data mining and analysis scheme for research trends investigation,scholars’ spatio-temporal relations exploration study,and cooperative networks evolution analysis.At last,we propose key technical solutions and apply them in application analysis.For research trends investigation,we design the keyword identification and extraction algorithm of academic papers,then design the exploration method from two aspects of hot research direction and hot technology evolution,and finally explore the law and trend of the research in this field with time evolution.For scholars’ spatio-temporal relations exploration study,we design the spatiotemporal information mining algorithm of scholars’ careers based on works of(2),(3),and(4),then design the analysis method for scholars’ distribution and mobility,and finally explore the relationship between scholars’ mobility as well as mobility trend and the creativity of scholars.For cooperative networks evolution analysis,we first design a cooperative network analysis method based on collaborative papers,then carry out quantitative and visual analysis based on ”small world” theory,and finally explore the impact of mobility on cooperative relationships.This paper analyzes and demonstrates the law and trend of the development of the field of artificial intelligence and provides empirical materials for exploring the law of the development of science in specific fields.
Keywords/Search Tags:Data Mining, Deep Learning, Science of Science, Entity Disambiguation, Graph Neural Network, Knowledge Graph, Data Analytics, Sequence Models
PDF Full Text Request
Related items