Font Size: a A A

Research On Fine-grained Entity Knowledge Recognition And Enhanced Retrieval For Academic Full Text

Posted on:2022-10-20Degree:MasterType:Thesis
Country:ChinaCandidate:C JiangFull Text:PDF
GTID:2558307133487924Subject:Information Science
Abstract/Summary:PDF Full Text Request
With the massive increase of academic full texts,users can obtain massive amounts of academic literature.However,the massive academic literature has not brought the wealth of knowledge to users.Due to the existence of massive amounts of irrelevant information,it is increasingly difficult for users to quickly obtain information.Therefore,effective knowledge organization and knowledge mining are necessary for academic literature.However,there are few researches on the fine-grained entity knowledge mining and organization of academic literature in the field of information science,and enhanced retrieval.Therefore,the purpose of this research is to build a model of entity knowledge recognition in the academic full text of the field of information science,and include the academic full text The meaningful and valuable models,methods,and resources are excavated in the form of fine-grained entity knowledge,and the labeled chapter structure information is used to integrate the entity knowledge to construct an ALBERT-based semantic relevance ranking model to achieve semantic enhanced retrieval.In order to meet the needs of academic users for fine-grained knowledge,the academic literature is organized according to chapter structure and entity knowledge,giving academic users the ability to quickly obtain relevant information,expanding academic users’ understanding of research methods,and bursting new ideas Research sparks.This paper takes the knowledge organization and indexing based on academic full text as the background to realize the fine-grained entity knowledge recognition model of academic full text and integrate the fine-grained entity and text structure into the pre-training language model,and proposes to be adapted to the academic field of semantic enhanced retrieval The model is researched for the purpose,mainly around the following aspects.(1)Construction of academic full-text corpusSelected academic full-text data of JASIST and Scientometrics journals,and used web crawler,HTML parsing,Mysql database and other technologies to conduct distributed collection,analysis and storage of academic full-text distributed on the Internet,and then used manual annotation.A corpus of entity knowledge in the field of information science and a corpus of text structure have been constructed.The distribution of entity knowledge is revealed from the perspectives of the number distribution of entity knowledge types,the distribution of entity knowledge chapter structure,the distribution of entity knowledge mention year,and the joint distribution of entity knowledge mention and literature.(2)Construction of entity knowledge recognition model for academic full textIn order to fully compare the performance of the models,select academic full-text entity knowledge recognition models with good performance,and use machine learning models(hidden Markov model,conditional random field model)and deep learning models(two-way cyclic neural network model,two-way long and short memory).Neural network model,twoway gated recurrent unit neural network model),pre-training language model(BERT,SCIBERT)in-depth and comprehensive construction of academic full-text entity knowledge recognition model.It is found that the overall performance of the BERT model has the highest harmonic average value,reaching 85.249%,followed by SCI-BERT,Bi-GRU-CRF,BiLSTM-CRF,and Bi-RNN-CRF deep learning models.The lowest harmonic average is the HMM model,and the harmonic average is only 73.136%,which is 12.113% lower than the highest BERT model.(3)Construction of an enhanced retrieval model incorporating fine-grained entity knowledgeIn order to organize and sequence the academic full text containing fine-grained entity knowledge and text structure information,and build a new intelligent and knowledge-based retrieval system,first use the Elasticsearch retrieval framework to build an index library containing entity knowledge and text structure,and A website for enhanced retrieval and collection of user explicit feedback based on fine-grained entity knowledge and text structure is built through Python Web technology.At the same time,a pre-trained language-enhanced retrieval model suitable for long texts of academic literature is proposed,and entity knowledge is incorporated to construct an ALBERT-Ranking semantic relevance ranking model.It is found that NDCG@1,NDCG@3,NDCG@5,NDCG@10 and other search evaluation indexes have been significantly improved,which are improved by 24.39%,3.75%,4.65%,and 4.41% respectively compared with the traditional full-text retrieval model.
Keywords/Search Tags:Knowledge Organization and Indexing, Pre-training language model, Information Retrieval, Entity Knowledge, Academic Full Text
PDF Full Text Request
Related items