Research On Personal Information Extraction Methods For Scholars' Web Page

Posted on:2024-07-28

Degree:Master

Type:Thesis

Country:China

Candidate:Y L Tian

Full Text:PDF

GTID:2568307130475974

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Information extraction task is a significant undertaking in the domain of text mining.Scholar webpages serve as platforms for scholars to showcase their personal information,and the extraction of personal information from these webpages has emerged as a prominent research topic in the field of information extraction.However,existing methodologies primarily concentrate on contextual associations between words while overlooking the interdependencies within the local context of the webpage text.Neglecting these associations results in incomplete feature acquisition during the information extraction process.Furthermore,the concise expressions employed in scholar webpages often give rise to the challenge of multiple attributes.For instance,the statement "A scholar pursued their undergraduate and graduate studies at XX University" simultaneously attributes XX University as the scholar’s "undergraduate institution" and "graduate institution," which existing methods for extracting personal information from scholar webpages seldom account for.This limitation significantly impacts the final outcomes of the information extraction process.To address the aforementioned challenges,this research puts forth two novel approaches for personal information extraction from scholar webpages: a method integrating local semantic features and a method based on machine reading comprehension.The former method prioritizes the semantic associations within the local scope of webpage text.By employing gated bilinear neural networks and convolutional neural networks,the method integrates the target vector with adjacent local text vectors,thereby obtaining higher-dimensional semantic representations.This fusion enhances the effectiveness of extracting scholars’ personal information.Experimental results convincingly demonstrate the superior performance of this approach compared to other existing methods.The latter method tackles the issue of multiple attributes in the textual representation of personal information on scholar webpages.It leverages the principles of machine reading comprehension by tailoring specific questions to different scholar attributes.Through a question-and-answer framework,this method elucidates the attribute information of scholars during the process of information extraction,effectively mitigating the challenges posed by multiple attributes in personal information extraction.The experimental results demonstrate that this approach is capable of effectively addressing the issue of multiple attribute presence in texts,thereby enhancing the performance of extracting personal information from scholar web pages.

Keywords/Search Tags:

Information extraction, Neural network, Reading comprehension, Pre-Training Model

PDF Full Text Request

Related items

1	Reasearch On Machine Reading Comprehension Methods Based On Incorporating The History Of Conversation
2	A Study Of Self-training Methods For Machine Reading Comprehension Span Extraction Tasks
3	Research On Chinese Multi-text Reading Comprehension Model Based On Neural Network
4	Research On Reading Comprehension Of Pre-Training Language Model Fused With Knowledge
5	Research On Reading Comprehension Method Based On Graph Neural Network
6	Research On Machine Reading Comprehension Method Based On Deeping Learning
7	Research On Reasoning Machine Of Reading Comprehension Model With Multiple Choice
8	Application Of Knowledge-embedded Pre-training Model In Reading Comprehension
9	Research On Conversational Machine Reading Comprehension Based On Time Sequence Information
10	Machine Reading Comprehension Model Design Based On Specific Dataset