Font Size: a A A

Research On The Construction Method Of Knowledge Graph For Baidu Baike

Posted on:2020-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:W YangFull Text:PDF
GTID:2428330578952718Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the continuous development of the Internet,the content of Internet data has shown an explosive growth trend.Due to the large-scale,diversified and loose organizational structure of Internet content,how to effectively access information and knowledge faces enormous challenges.With its powerful semantic processing and open organization capabilities,Knowledge Graph lays the foundation for knowledge and intelligent applications in the Internet age.In the current environment where society and science are developing rapidly,the knowledge graph has gradually become a novel way of managing massive knowledge.The data structure of the knowledge graph is composed of entities,relationships,attributes,etc.The basic unit is a triplet,where the entity is the point in the corresponding graph,and the relationship corresponds to the edge,which can display the relationship well,simply The knowledge graph can display different entities through the same attributes or other features in a graphical way to get a relational network.The knowledge graph also opens up a new way to analyze problems from the perspective of "relationship".This article aims to build a knowledge graph based on Baidu Baike.Because Baidu Baike's web page data is complex and diverse,how to obtain useful knowledge from massive web pages becomes a big challenge.In addition,the reason why Baidu Baike is chosen is because it has three major characteristics:1.Easy to acquire,each web page is only introduced around one entity,and the information is detailed and comprehensive;2.Knowledge extraction is relatively simple,because Baidu Baike each The web page format of the entity is relatively uniform,and includes a lot of semi-structured information tables to facilitate subsequent knowledge extraction.3.Baidu Baike's web content is written by professionals and has relatively high quality.The main work carried out in this paper is as follows:1.Get raw data from Baidu baike webpage,This article uses the method of web crawler to crawl the semi-structured data source of Baidu baike,and obtain the entity name,which is the name of the entry and the corresponding html file,which should be noted:the entities and articles of Baidu baike are usually one-to-one correspondence,and entities generally correspond to the title of the article.Because the content of Baidu baike is too rich,this paper will only extract some triads to study the construction of knowledge graph.2.Further processing the data obtained by the reptile,first extracting the text,then obtaining structured information from it,and extracting the triples to lay a solid foundation for the implementation of the following construction.3.Then store and build the knowledge graph through the Neo4j graph database.4.Finally,the data is visualized by the webpage,and the query result of the back-end database is converted and then transmitted to D3 to draw the image on the front end to realize the query of the data on the webpage.
Keywords/Search Tags:Baidu baike, Knowledge grapy, Web crawler, Knowledge extraction, Visualization
PDF Full Text Request
Related items