| Information extraction task is a significant undertaking in the domain of text mining.Scholar webpages serve as platforms for scholars to showcase their personal information,and the extraction of personal information from these webpages has emerged as a prominent research topic in the field of information extraction.However,existing methodologies primarily concentrate on contextual associations between words while overlooking the interdependencies within the local context of the webpage text.Neglecting these associations results in incomplete feature acquisition during the information extraction process.Furthermore,the concise expressions employed in scholar webpages often give rise to the challenge of multiple attributes.For instance,the statement "A scholar pursued their undergraduate and graduate studies at XX University" simultaneously attributes XX University as the scholar’s "undergraduate institution" and "graduate institution," which existing methods for extracting personal information from scholar webpages seldom account for.This limitation significantly impacts the final outcomes of the information extraction process.To address the aforementioned challenges,this research puts forth two novel approaches for personal information extraction from scholar webpages: a method integrating local semantic features and a method based on machine reading comprehension.The former method prioritizes the semantic associations within the local scope of webpage text.By employing gated bilinear neural networks and convolutional neural networks,the method integrates the target vector with adjacent local text vectors,thereby obtaining higher-dimensional semantic representations.This fusion enhances the effectiveness of extracting scholars’ personal information.Experimental results convincingly demonstrate the superior performance of this approach compared to other existing methods.The latter method tackles the issue of multiple attributes in the textual representation of personal information on scholar webpages.It leverages the principles of machine reading comprehension by tailoring specific questions to different scholar attributes.Through a question-and-answer framework,this method elucidates the attribute information of scholars during the process of information extraction,effectively mitigating the challenges posed by multiple attributes in personal information extraction.The experimental results demonstrate that this approach is capable of effectively addressing the issue of multiple attribute presence in texts,thereby enhancing the performance of extracting personal information from scholar web pages. |