Design And Implementation Of Topic Web Crawler Based On Financial Knowledge Graph

Posted on:2022-02-17

Degree:Master

Type:Thesis

Country:China

Candidate:Y W Xu

Full Text:PDF

GTID:2518306566991309

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the fast development of the Internet,search engines have become an important tool for people to obtain all kinds of information.In recent years,the search engines,such as Baidu and Google,they are difficult to achieve accurate results due to their wide search coverage.The topic search engines for specific areas can help users filter a lot of irrelevant information,and users can quickly and accurately obtain the information they need.In order to help financial practitioners get the financial text data accurately and efficiently in the large amount of web pages.The purpose of this article is focus on the financial field,research on fast and effective web crawler technology.This paper proposes a method for extracting keywords from web pages assisted by knowledge graphs.To achieve the efficient topic crawler,one method is selecting the relevant pages by combining the link structure of certain rules and the semantic similarity calculation between key phrases and themes.The main contents and methods of the study are as follows:(1)Aiming to the problem of topic description in topic crawler technology,this article proposes a method of constructing financial knowledge map to describe topics.And choosing to use the Bert-Bi LSTM-CRF model to extract named entities and relationships from financial related texts,and performs knowledge fusion on heterogeneous data to solve the problems of inconsistent and missing entity attribute values.In the final step,Neo4 j is being used to realize the persistent storage of triple data and complete the construction of financial knowledge graph which named Fin Graph.(2)Aiming to the problem of crawling strategy in topic web crawler technology,a key phrase extraction algorithm based on knowledge map is proposed.AP clustering algorithm based on semantics is applied to text.This paper uses the financial knowledge map to connect the words in the cluster to the entities in the knowledge map,mines the potential relationship between words through the semantic network structure,gives the edge weight to quantify the potential relationship,constructs the relational word map.And constructs the framework of extracting key short words by integrating AP clustering algorithm and graph centrality algorithm,aim to screen out the pages related to financial topics and reduce the interference of irrelevant information,so that the results returned by the topic crawler have a high accuracy.(3)Combined the above two research contents,this paper designs a hybrid theme web crawler,which is according to combine the content of web page text and link structure to determine the theme.This paper uses Fin Graph to extracted key phrases from the web page text,combine the extracted key phrases and topics to calculate the semantic similarity,and at the same time consider the link structure to filter out the more relevant pages.Finally,Fin Graph is further supplemented according to the crawled web page text.

Keywords/Search Tags:

topical crawler, financial field, Bert-BiLSTM-CRF model, knowledge graph, key phrases

PDF Full Text Request

Related items

1	Research And Realization Of Topical Crawler Based On Content And Hyperlink
2	Discovering latent topical phrases in document collections and networks with text components: Leveraging text mining and information network analysis for human oriented applications
3	Research And Application Of Tourism Question Answering System Based On Knowledge Graph
4	Research On The Construction Of Chinese Patent Knowledge Graph
5	Stock Price Prediction Model For Integration Knowledge Graph And Emotional Analysis
6	Study On The Construction Of Knowledge Map Of Red Literature Resources In The Perspective Of Digital Humanities
7	Research On Person-post Matching Model Based On Knowledge Graph And Bert
8	Extract Topical Keyphrases From Chiniese Text Corpora
9	The Construction And Application Of Commodity Knowledge Graph In The Field Of Catering
10	Research On Topical Crawler Combining Web Page Content And Hyperlink