Font Size: a A A

Design And Implementation Of Triple Semi-Automatic Annotation System

Posted on:2024-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:W K FengFull Text:PDF
GTID:2568306944457994Subject:Computer technology
Abstract/Summary:PDF Full Text Request
At present,knowledge graph technology is developing rapidly,and the value of knowledge graphs is being explored continuously,and various domain knowledge graphs are being built rapidly.The triples play a key role in the construction and application of knowledge graphs,and they connect different entities and concepts to form a rich network of knowledge relations.Since the construction requirements of domain knowledge graphs are obviously different from those of general knowledge graphs,there are many challenges in the process of domain triple annotation:(1)Domain triads need to be extracted from a large amount of natural language data,and a lot of manual annotation work is required.(2)Domain knowledge graphs require high quality of triple data.(3)Domain triple have specific types of entities and relationships,and domain-specific knowledge extraction models need to be trained for triple extraction.To address the above problems,this thesis designs a data preannotation method based on joint extraction of entity relationships and an ALBERT-based data annotation method,and constructs a triadic semiautomated annotation system.The details of the research are as follows.(1)This thesis designs a triple data pre-annotation method based on joint extraction of entity relations and proposes a joint extraction model based on a multi-layer pointer network and a multi-head selection matrix.The entity nesting problem is solved by the entity extraction method based on the multi-layer pointer network,and the relationship overlap problem is solved by the relationship extraction method based on the multi-head selection matrix.(2)This thesis designs an ALBERT-based triple data annotation method.The method improves the accuracy of the pre-annotated model by training the model in multiple cycles.The method uses a lightweight entityrelationship joint extraction model to improve the efficiency of model training while also solving the relationship overlap problem and improving the accuracy of model extraction by expanding the sequence annotation range.(3)This thesis designs and builds a semi-automatic annotation system for triple.to improve the efficiency of manual annotation by introducing intelligent pre-labelling of data.The system is designed with appropriate domain management,task management,data statistics,topic distribution and other functions to provide an efficient and convenient platform for the annotation and construction of domain-oriented knowledge graphs.Through comparative experiments and system tests,the pre-labeling effect in the joint extraction-based triple data pre-labeling method proposed in this thesis is significantly better than other joint extraction models;the ALBERT-based triple data labeling method proposed in this thesis can guarantee a higher pre-labeling effect while significantly speeding up the model training efficiency;the system is functionally complete and has high usability and robustness.
Keywords/Search Tags:Triple, Knowledge Graph, Federated Extraction, Annotation System, BERT
PDF Full Text Request
Related items