Font Size: a A A

Research On Knowledge Processing And Graph Representation Of Unstructured Data

Posted on:2023-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:G Y LaiFull Text:PDF
GTID:2568307028461794Subject:Electronic information
Abstract/Summary:PDF Full Text Request
In the era of deep popularization of Internet technology,massive data always washes out the eyes of users browsing information.How to capture accurate and useful knowledge from unordered and large-scale multi-source data is the main research direction of current Internet technology.With the development of artificial intelligence technology,knowledge atlas came into being.Knowledge atlas can connect fragmented entities through relationships to build a structured semantic knowledge base,which makes the knowledge existing in the real world easier to understand,query,manage and apply.At present,knowledge maps have been rapidly popularized in academia and industry,and have been widely used in biomedical,financial risk control,public security,disaster prevention and other fields.Various kinds of general encyclopedic knowledge maps,common sense knowledge maps and knowledge maps in vertical fields have emerged one after another.Knowledge map technology has also been constantly improved with the continuous in-depth exploration of researchers.There are still problems and challenges in the development of knowledge mapping technology.On the one hand,although the existing knowledge maps already contain a large amount of factual knowledge,most of the current knowledge maps use structured or semistructured data,ignoring a large amount of effective knowledge hidden in unstructured data,which leads to the fact that the entire knowledge map is still very sparse.At the same time,the existing general knowledge maps are difficult to be directly applied to vertical industries in specific fields.How to extract new knowledge from different forms of data to supplement the knowledge map or rebuild a specific domain specific industry knowledge map is the current problem to be solved.On the other hand,with the continuous expansion of the scale of the knowledge map,the traditional discrete symbolic representation of the knowledge map makes the knowledge map retrieval inefficient and unable to express the semantic association between entities,and is difficult to be widely used in downstream tasks.For the problem of sparse knowledge map data,it mainly involves knowledge map completion technology,and knowledge extraction technology and entity linking technology are the main research contents,and the results will determine the final quality of knowledge map completion.For the knowledge map representation problem,the current mainstream solution is to use the embedded representation method of knowledge map to encode the entities and relationships in the knowledge map into specific vector data,which can be embedded into the dense low dimensional vector space.The continuous embedded representation method can also effectively solve the sparsity problem of the knowledge map.At the same time,dense low dimensional vector representation is more conducive to most mainstream downstream task algorithms that take the form of eigenvectors as input.Therefore,the main research contents of this paper include the following aspects:(1)This paper designs a Chinese knowledge extraction method based on BERT wwm ext.This method independently learns two encoders for knowledge extraction and relationship extraction.For the solid model,the concept of span grade is introduced to extract all possible spans that may be entities,and the entity type of each span is judged through the activation function.For the relational model,the concepts of entity boundary and entity type are introduced and added as identifiers before and after the entity span,which are input into the relational model,and then all entity pairs are classified.(2)This paper designs a Chinese entity linking method based on multi-dimensional feature fusion.For the candidate entity generation problem of entity link,this method uses four methods to generate candidate entities,and verifies the effectiveness of the candidate entity selection method.Aiming at the ranking problem of candidate entities linked by entities,this paper regards this problem as a classification problem,and introduces a Chinese pre training model to build a classification model to calculate the similarity score of candidate entities and entity references.(3)In order to verify the feasibility of knowledge extraction and entity link model,and to provide effective map data for the research of knowledge map representation technology,this paper aims at the effective data provided by Wikipedia Chinese Encyclopedia,and uses knowledge extraction and entity link model to obtain structured knowledge from unstructured text data,and applies knowledge storage technology to build a encyclopedia knowledge map.(4)On the basis of the construction of encyclopedic knowledge map,a knowledge map representation method of semantic information and graph neural network is designed to solve the problems such as data sparsity in the discrete symbolic knowledge map representation method.This method effectively integrates the semantic information of entities,relationships and triples in the knowledge map.Three different semantic information are fused and encoded through multiple iterations,and mapped to entities,Implement the embedded representation of knowledge map.
Keywords/Search Tags:Knowledge extraction, entity linking, knowledge graph construction, Knowledge Graph Representation Learning
PDF Full Text Request
Related items