Knowledge graph,as a large-scale semantic network knowledge base,which integrates a large amount of knowledge information with graph topology structure,so that it contains many facts with various entities and relations.Knowledge graph aims to transform massive data and complex information into structured knowledge in order to build a more intelligent system,so as to better achieve applications such as natural language processing,question answering system,search engine,and so on.However,since the entities and relations contained in knowledge graph often come from different data sources,it is difficult to ensure the quality of the data.Unreliable data quality will lead to a large amount of incorrect or incomplete information in the knowledge graph.Knowledge graph completion is an automated approach to supplement entities or relations in the knowledge graph,thus increasing its completeness and accuracy.In view of the problem of low quality of data in knowledge graph,how to efficiently complete it has become an important research topic nowadays.Knowledge graph completion can help expand the scale and coverage of the knowledge graph,thus reflecting the knowledge and information of the knowledge graph more comprehensively.Knowledge graph completion fills in the missing information of entities and relations in the knowledge graph in an automated way.There are two ways to obtain completed knowledge graph information: one is to infer new information based on existing information in the knowledge graph,and the other is to extract new information from structured and unstructured data and add it to the knowledge graph.Knowledge graph embedding is an important technique for completing the knowledge graph.By mapping elements of entities and relations in the knowledge graph to a low-dimensional vector space,efficient storage,representation and processing of the knowledge graph can be achieved.The semantic associations between each element of the knowledge graph and other elements are translated into numerical forms that can be processed by computers.In recent years,more and more researchers have started to use machine learning and deep learning to further optimize embedding algorithms to meet different needs.The family of translation models represented by the Trans E model,which interprets the relations in a triplet as a transformation from the head entity to the tail entity.It simplifies the complex semantic connection between entities and relations and greatly improves the complementary efficiency of knowledge graphs.This paper study in-depth the Trans(Translation embedding methods)family,including Trans E,Trans H,Trans R,Trans D,etc.Although the translation models of the Trans family have strong generalization abilities,they have not invested much research in generating negative triplets,resulting in low-quality negative triplets involved in the training process.These low-quality negative triplets can cause the model to fail to update the entity vectors and relation vectors effectively during training.In addition,these models mainly model the embedding of entities and relations in the knowledge graph,treating each vector representation in the same way for each triplet,ignoring the important features of triplet mapping properties in the model learning process.This leads to deviation between the learned entity vectors and relation vectors and the real-world semantics of entities and relations.To address the two key issues mentioned above,this paper mainly improves the Trans family models in the following two aspects:1.Proposing a CCS negative sampling method,which clusters and caches entity sets based on the similarity between entities,and extracts entities from the cache with semantic consistency with the replaced entities in the triplet to construct high-quality negative triplets.Meanwhile,based on the cache,it tracks high-quality negative triplets,effectively alleviating the invalid training caused by low-quality negative triplets.This negative sampling method is a universal method that can be applied to all knowledge graph embedding models.2.Introducing a weighted strategy WTrans based on the triplet mapping properties,which is based on the Trans family models and incorporates the mapping properties information of triplets into the model.The mapping properties information of triplets are calculated according to the degree to which the embedding vector of each training triplet can correctly represent semantic information.The different weights of each participating training triplet are pre-calculated by triplet mapping properties.Treating each triplet differently in terms of weights effectively facilitates the feature learning capability of the model.Finally,the proposed method is compared with the international open data sets from Word Net and Freebase to evaluate the effectiveness of the proposed method.The experimental results show that the CCS negative sampling method applied to the Trans models improves the completion effect of knowledge graph.WTrans,which further integrates triplet mapping properties into the translation model,has an even more significant impact on completing the knowledge graph.The proposed methods have achieved a significant improvement in completion effect for Trans models on almost all datasets,ensuring the execution efficiency of the proposed methods in knowledge graph embedding models. |