Design And Implementation Of Expert Homepage Information Extraction System

Posted on:2020-04-07

Degree:Master

Type:Thesis

Country:China

Candidate:J Zhang

Full Text:PDF

GTID:2427330626450730

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Industry-University-Research cooperation is an important part of improving the high-tech innovation ability of China's small and medium-sized enterprises.However,it faces difficulties in talent introduction.The disconnection between the government and the academic circle,the information asymmetry of scientific research institutions and enterprises are the main causes of this problem.Expert home page information on the Internet can help users perceive experts and support the introduction work.However,the expert homepage also has problems such as scattered site distribution and unclear text description.It is necessary to integrate all relevant homepage resources and extract effective information from them to provide users with more convenient and accurate expert information review solutions.In order to achieve the above objectives,this thesis designs and implements an expert homepage information extraction system based on Web information extraction technology.The system is essentially a sub-module of the expert information platform,which completes the framework of the expert portrait in the platform.Among them,the expert portrait is defined as a visualization page describing the general overview of the expert,the research direction,etc.,and the information extracted from this paper is combined.The main work of this thesis is as follows:(1)The system targets the list of experts given by the platform,and automatically determines the home site from the network query results.And combined with HTML structure,Chinese and English grammar to complete the web page text positioning,screening,standardization processing,to achieve data collection.(2)The preprocessing of the data includes the steps of constructing a corpus,annotating the data set,and selecting a feature vector.The system implements an automatic labeling scheme with the results of text parsing and rule matching.Considering the text semantics of the field and the context structure,the Word2 Vec,TF-IDF,POS,NER and other indicators are introduced to complete the feature vector selection.(3)In the extraction of homepage information,this thesis proposes a scheme for deciphering candidate fields by part of speech and outputting field labels through SVM and GBDT classification models.In order to improve the overall performance of information extraction,multi-group model weighted voting fusion is realized by setting different model parameters and training set sampling.(4)The extracted information needs to be integrated before it can be filled into the corresponding expert portrait.Considering the position of the field in the original context,the output tag and other parameters,an information integration algorithm is proposed.In this paper,the composition elements and placement positions of various types of information in the expert portraits are specified,and the visual display is realized through unified page design and structured data.

Keywords/Search Tags:

Expert portrait, web information extraction, text analysis, model fusion

PDF Full Text Request

Related items

1	Research On Expert Information Acquisition And Expert Recommendation Methods For Enterprise Demand
2	Research And Application Of Chinese Multi-relation Extraction Based On Fusion Model
3	Research On Difficulty Prediction Model Of Examination Questions Based On Text Extraction Of Association Information
4	An Analysis Of Obstacles To The Extraction Of Text Information In High School Students' Geography Learning
5	Keywords Extraction Based On News Text
6	The Status Of Research And Analysis Of High School Students Of Biological Information Extraction Ability
7	Adaptive Web Information Extraction Research Based On Connectivism
8	Research On Structured Extraction Of Recruitment Text Data Based On Deep Learning
9	Portrait Analysis Of Data Practitioners Based On Machine Learning Model
10	The Design And Implementation Of University Portrait System Faced On Industry Academia Research