Font Size: a A A

Research On Cyber Security Threat Discovery And Tracking Technology Based On Topic Detection

Posted on:2020-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhouFull Text:PDF
GTID:2428330572972245Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The emergence of malicious software and advanced persistent attacks(APTs)requires security experts to analyze and detect network threats in real time from open source data,transforming them into readable threat intelligence to helps security analysts respond quickly and defend against emerging cyber threats.However,it is not possible to manually identify cyber threats from large amounts of open source unstructured texts.For these various reasons,we need multi-dimensional knowledge discovery and data mining methods to help our system improve and understand network threats.First,we extract threat-related information from multi-source security data,and then synthesize these knowledge fragments to create a higher-level concept to describe the phenomenon of potential threats.It can be described as real-time identification of upcoming security topics from fragments of open source threat information,forming threat intelligence,and helping security-related personnel to quickly respond emerging cyber threats.Most previous researches(security week,ThreatBook system)focus on using machine learning to discover general threat categories rather than real-time threat targets.Previous systems need to enter keywords or just give general threat categories(Virus&Threats)instead of specific Threat(APT).Therefore,we propose a novel FAC-CTI(network threat intelligence detection based on domain feature extraction and improved hierarchical clustering)method to analyze open source threat data and identify emerging threat topics in real time.The FAC-CTI method of threat topic detection in this paper is mainly composed of three parts:data collection and preprocessing,key feature extraction and topic clustering.In the first part,the data acquisition module collects all kinds of security data of security BBS and security information website.In the second part,we proposed three feature extraction methods:?based on the keyword recognition method of TF-IDF(Term FrequencyInverse Document Frequency),this paper proposed the incremental TF-IDF method considering the word location and part of speech,calculating the word weight,and extracting the keyword features;?combining the word vector model of transfer learning to train word vectors,this paper proposes the Latent Dirichlet Allocation(LDA)method of word similarity and domain filtering strategy to identify the theme features;?The entity identification method can identify the domain-specific entity features such as place names,person names,security organizations and so on.In addition,the feature fusion technique is used to integrate the above features and build the feature vector of the paper.Unlike previous open source threat intelligence work,the above feature extraction method makes full use of security domain knowledge,extract features with domain knowledge,and construct the article vector.In the third part,based on the HAC(hierarchical clustering)algorithm,this paper proposes an improved hierarchical clustering algorithm to cluster articles in each period of time,mine security topics,and real-time recognition of emerging topics or the continuation of historical topics.The experimental data set comes from the open source wiki,as well as eight security websites and BBS collected by crawlers.The experimental results prove that the FAC-CTI method has remarkable performance and can identify the threat topic well.The recall rate,accuracy rate and F value of threat topic detection on the two datasets are all above 0.98,and the experimental results were higher than other common topic detection methods.
Keywords/Search Tags:threat intelligence, topic detection, transfer learning, feature fusion
PDF Full Text Request
Related items