| The vigorous development of the network has brought many conveniences to people’s production and life,but at the same time,network security problems have emerged one after another.In order to better deal with frequent network security incidents and provide decision support for security analysts,threat intelligence technology has gradually been applied to situational awareness,threat discovery and security defense,and has become a research hot spot in the field of network security research.According to the task requirements of network security situational awareness,in the context of China Education and Research Network(CERNET)backbone network security assurance system and Structured Threat Information Expression(STIX),this thesis deals with the internal threat intelligence from the security system and the open and shared external threat intelligence in the Internet,and finally realizes the collection,fusion,semantic recognition and sharing of multi-source threat intelligence data.The main work of this thesis is as follows:Aiming at the current situation that the existing security assurance system only has the ability to collect internal intelligence based on network traffic monitoring,and the intelligence information provided is limited,this thesis analyzes the source,data format and content of threat intelligence,and collects threat intelligence data from several mainstream intelligence source sites.In order to effectively manage the collected multi-source heterogeneous data,this thesis designs a data organization form based on the STIX standard,and maps heterogeneous intelligence data from different sources based on the data organization form,so that the intelligence data fields of different data structures can be unified;then,the multi-source data after data mapping is fused to realize the unified management and standardization of multi-source threat intelligence,and provide more abundant information for threat intention identification.Aiming at the problem that threat intelligence information in unstructured text cannot be directly utilized,this thesis proposes a threat semantic recognition method to construct corresponding structured intelligence data.In this thesis,the task of threat semantic recognition is divided into threat entity semantic recognition subtask and threat entity relation extraction subtask.First,an entity recognition model based on the BERT-Bi LSTM-CRF model is designed,and the dynamic word vector is generated by using BERT to solve the problem of polysemy.Further,the neural network model is used to automatically extract the global features of the text,so that it can recognize the unstructured text.Then,a relation extraction model based on the BERT pre-training model is designed to extract the relation semantic information between entities in unstructured text.Experimental results show that the recognition method designed in this thesis can effectively identify entity and relation semantic information from unstructured intelligence text.In order to meet the needs of Jiangsu Education Network for the collection and management of threat intelligence and to provide a data basis for the exchange and sharing of threat intelligence between different regions of CERNET,this thesis designs and implements a threat intelligence platform prototype system named TIP-NJ.TIP-NJ provides functions such as intelligence gathering,fusion,semantic recognition and sharing.In addition,this thesis illustrates the practical application of intelligence gathering and sorting information,which proves the availability of threat intelligence information on the TIP-NJ platform. |