Font Size: a A A

Research On Information Extraction Methods For Cybersecurity Knowledge Graph Construction

Posted on:2024-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:J F LiFull Text:PDF
GTID:2568306941469934Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Cyber Threat Intelligence(CTI)provides a new theoretical basis for designing cybersecurity defense frameworks by collecting,organizing,and analyzing threat information.With the increasing threat information,CTI is characterized by multiple sources,heterogeneous,massive,and fragmented,facing the problem of timely analysis,correlation,and fusion.Knowledge Graph(KG)technology can correlate and fuse multi-source fragmented CTI to help security experts defend against cybersecurity threats.As the key stage of constructing Cybersecurity Knowledge Graph(CKG),information extraction directly affects the quality and usability of the CKG.Therefore,this paper focuses on researching information extraction methods for constructing a CKG from two information extraction subtasks,named entity recognition and relationship extraction.The research contents are as follows:(1)A cybersecurity entity recognition method based on prompt learning techniques is proposed.Entity recognition is a fundamental task in the information extraction stage and aims to mark threat-related concepts in CTIs.The method is based on the Bidirectional and Auto-Regressive Transformers(BART)model,which models the named entity recognition task as a pre-trained language model ranking problem.The problem of entity nesting in cybersecurity entity recognition and the difficulty of determining entity boundaries is addressed using sliding windows of different sizes.The original text input and the prompt templates populated by the entity fragments to be recognized are used as the source and target sequences of the BART model.After that,the entity type identification in CTIs is completed based on the probability score calculation of the BART model for different sequences.In the template construction process,this paper designs manual and automatic machine generation methods to find the best templates for matching cybersecurity texts.In addition,the models trained for different prompt templates are further optimized for accuracy by using integrated learning methods to make different templates complement the knowledge missed during training.Finally,the effectiveness of the model is analyzed through theoretical elaboration and experiments.(2)A method of entity relationship extraction is proposed based on the Bidirectional Encoder Representation from Transformers(BERT)model and semantic fusion.A knowledge triple is the smallest component of a knowledge graph,and entity relationship extraction is used to construct<head entity,relationship,tail entity>triples by extracting the relationships between different threat entities.In this paper,the relationship extraction task is modeled as a multiclassification problem.Considering the problem that the relationship recognition of cybersecurity entities is easily affected by noisy words,this paper abstracts the external semantic features of cybersecurity texts using the shortest dependency path and entity masking methods and fuses them as the input of the BERT model.After that,the BERT model is used to generate word embedding vectors and global feature vectors with contextual semantic information.Based on this,convolutional neural networks are introduced to capture local features and determine the relationship between network security entities using semantic features under different horizons.Moreover,the effectiveness of the proposed method is verified by ablation experiments and comparison experiments.(3)A knowledge graph construction framework is designed based on the Unified Cybersecurity Ontology(UCO)model and the methods proposed in Chapters 3 and 4 of this paper.The CTI data from multiple heterogeneous sources are collected through crawler technology,and the raw data are cleaned by designing rules to generate a cybersecurity text corpus.According to the entity recognition and relationship extraction models,information is extracted from the corpus contents,and the knowledge representations are generated into triples.The experiments are designed to verify the usability of the knowledge graph generated by the method in this paper.
Keywords/Search Tags:Cybersecurity Knowledge Graph, Information Extraction, Cyber Threat Intelligence, Knowledge Graph Construction
PDF Full Text Request
Related items