Font Size: a A A

Research On Knowledge Graph Data Quality Improvement Methods

Posted on:2024-08-24Degree:MasterType:Thesis
Country:ChinaCandidate:C Y ZhouFull Text:PDF
GTID:2568307076973059Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As a novel semantic network,Knowledge Graphs(KGs)serve to describe various entities existing in the real world and the semantic relations among them.Nowadays,KGs play an increasingly important role in artificial intelligence tasks such as product recommendation,question-and-answer system,decision support,and so on.However,many KGs cannot fulfill the requirements of some artificial intelligence tasks.One of the important reasons is that these KGs contain numerous noises.To solve the noise problem in KGs and improve the data quality of KGs,this paper mainly studies the data quality improvement of KGs.(1)To detect various types of noises in KGs,this paper proposes a high-accuracy KG noise detection method based on path trustworthiness and triple embedding(PTrust E).First,PTrust E constructs a correlation-based path trustworthiness network to learn the global and local features in the path from the head entity to the tail entity of the triple.Next,PTrust E integrates all features of the path into the Bi-directional Gated Recurrent Unit to learn the path score matrix and path trustworthiness,to keep the sequential nature of paths.Finally,PTrust E uses the path score matrix for triple representation learning and the path trustworthiness for judging whether the triple is correct or not.KG noise detection experiments on publicly available datasets verify the effectiveness of PTrust E.(2)To correct the various noises present in KGs,this paper propose a novel end-to-end model(BGAT-CCRF)to obtain a better noise correction effect.Specifically,this paper constructs a balanced-based graph attention model(BGAT)to learn the features of nodes in triples’ neighborhoods and capture the correlation between nodes according to their position and frequency.Next,this paper designs a constrained conditional random field model(CCRF)to select suitable candidates guided by three pre-defined constraints for correcting one or more noises in the triple.In this way,BGAT-CCRF can select multiple candidates from a smaller domain to repair multiple noises in triples simultaneously,rather than selecting candidates from the whole KG to repair noisy triples as traditional methods do,which can only repair one noise in the triple at a time.KG noise correction experiments demonstrate that BGAT-CCRF significantly outperforms current baseline methods.(3)To update the knowledge in the KG,this paper proposes a hierarchical and homogenous subgraph learning model for knowledge graph relation prediction(Hi Ho).Specifically,this paper proposes a subgraph-to-sequence mechanism(Subgraph2Seq)to learn the potential semantic associations between layers in the subgraph of a single entity,and thus model the hierarchy of the subgraph.Then,this paper proposes a common preference inference mechanism(CPI)that assigns higher weights to co-occurrence relations while learning the importance of each relation in the subgraphs of two entities,and thus model the homogeneity of the subgraph.Finally,this paper alternately inducts each layer of subgraph of the two entities to predict the relation between them,and then form high-quality noise-free triples.A series of experiments on five public datasets show that HiHo performs well.
Keywords/Search Tags:Knowledge graph, data quality improvement, noise detection, noise correction, knowledge update
PDF Full Text Request
Related items