Research On Information Extraction Methods For Cybersecurity Knowledge Graph Construction

Posted on:2024-04-13

Degree:Master

Type:Thesis

Country:China

Candidate:J F Li

Full Text:PDF

GTID:2568306941469934

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Cyber Threat Intelligence(CTI)provides a new theoretical basis for designing cybersecurity defense frameworks by collecting,organizing,and analyzing threat information.With the increasing threat information,CTI is characterized by multiple sources,heterogeneous,massive,and fragmented,facing the problem of timely analysis,correlation,and fusion.Knowledge Graph(KG)technology can correlate and fuse multi-source fragmented CTI to help security experts defend against cybersecurity threats.As the key stage of constructing Cybersecurity Knowledge Graph(CKG),information extraction directly affects the quality and usability of the CKG.Therefore,this paper focuses on researching information extraction methods for constructing a CKG from two information extraction subtasks,named entity recognition and relationship extraction.The research contents are as follows:(1)A cybersecurity entity recognition method based on prompt learning techniques is proposed.Entity recognition is a fundamental task in the information extraction stage and aims to mark threat-related concepts in CTIs.The method is based on the Bidirectional and Auto-Regressive Transformers(BART)model,which models the named entity recognition task as a pre-trained language model ranking problem.The problem of entity nesting in cybersecurity entity recognition and the difficulty of determining entity boundaries is addressed using sliding windows of different sizes.The original text input and the prompt templates populated by the entity fragments to be recognized are used as the source and target sequences of the BART model.After that,the entity type identification in CTIs is completed based on the probability score calculation of the BART model for different sequences.In the template construction process,this paper designs manual and automatic machine generation methods to find the best templates for matching cybersecurity texts.In addition,the models trained for different prompt templates are further optimized for accuracy by using integrated learning methods to make different templates complement the knowledge missed during training.Finally,the effectiveness of the model is analyzed through theoretical elaboration and experiments.(2)A method of entity relationship extraction is proposed based on the Bidirectional Encoder Representation from Transformers(BERT)model and semantic fusion.A knowledge triple is the smallest component of a knowledge graph,and entity relationship extraction is used to construct<head entity,relationship,tail entity>triples by extracting the relationships between different threat entities.In this paper,the relationship extraction task is modeled as a multiclassification problem.Considering the problem that the relationship recognition of cybersecurity entities is easily affected by noisy words,this paper abstracts the external semantic features of cybersecurity texts using the shortest dependency path and entity masking methods and fuses them as the input of the BERT model.After that,the BERT model is used to generate word embedding vectors and global feature vectors with contextual semantic information.Based on this,convolutional neural networks are introduced to capture local features and determine the relationship between network security entities using semantic features under different horizons.Moreover,the effectiveness of the proposed method is verified by ablation experiments and comparison experiments.(3)A knowledge graph construction framework is designed based on the Unified Cybersecurity Ontology(UCO)model and the methods proposed in Chapters 3 and 4 of this paper.The CTI data from multiple heterogeneous sources are collected through crawler technology,and the raw data are cleaned by designing rules to generate a cybersecurity text corpus.According to the entity recognition and relationship extraction models,information is extracted from the corpus contents,and the knowledge representations are generated into triples.The experiments are designed to verify the usability of the knowledge graph generated by the method in this paper.

Keywords/Search Tags:

Cybersecurity Knowledge Graph, Information Extraction, Cyber Threat Intelligence, Knowledge Graph Construction

PDF Full Text Request

Related items

1	Research On Key Technologies For Construction And Application Of Threat Intelligence Knowledge Graph
2	Research On Key Technologies For Construction And Application Of Cyber Threat Intelligence Knowledge Graph
3	Research On Construction Of Knowledge Graph For Cyber Threat Intelligence
4	Research And Application Of Threat Intelligence Knowledge Graph Construction Method For Unstructured Data
5	Research On Knowledge Graph Construction Technology For Cyber Threat Intelligence
6	Research On Knowledge Graph Construction Techniques For Dark Web Threat Intelligence
7	Construction And Application Of Cyber Security Knowledge Graph For APT Attack
8	Research On Key Technologies Of Cybersecurity Knowledge Graph Construction
9	Research On The Construction And Application Technology Of Threat Intelligence Knowledge Grap
10	Research And Implementation Of Cybersecurity Knowledge Graph Construction Technology