| The Internet has become an important issue in the new era to improve production efficiency,promote innovation and change,and accelerate human development.With the vigorous development of network information technology,cyberspace security threats have gradually penetrated into social production and life.Traditional data mining and analysis methods within the field of cyber security are no longer able to support the future of China’s Internet industry towards a new historical inflection point.Knowledge graph,as a technical means to process and visualize unstructured data,has created a huge research boom at home and abroad.The purpose of this paper is to study the knowledge extraction techniques involved in the process of knowledge graph construction in the Chinese threat intelligence domain,including named entity recognition techniques and entity relationship extraction techniques.At present,Chinese named entity recognition mostly uses text sequences to match with dictionaries to get vocabulary,and then uses grid structure or graph structure to introduce vocabulary information,but these two methods of integrating vocabulary knowledge do not consider global semantic interaction,introduce more interference vocabulary,and fail to effectively solve the problem of vocabulary boundary conflict.Most of the Chinese entity relationship extraction currently uses a character-level input-based model to classify relationships,which does not make full use of the lexical information and entity information in the input sequence.To address the above problems,for the named entity recognition task,this paper proposes a knowledge fusion method based on Lexicon-matched Word Inject(LWI),which is innovative in the way of input sequence semantic information extraction and lexical information utilization in the sequence.The method uses pre-trained language models to encode characters,captures sentence context features by Transformer_Encoder model,then injects lexicon word knowledge for each character,and then integrates characters with different words based on multi-headed self-attentive mechanism to improve the recognition effect.For the relationship extraction task,this paper proposes a relationship extraction method based on multi-feature embedding to innovate on the model embedding feature information.The method investigates how to perform multi-feature embedding in the input representation layer of the entity relationship extraction model.The multi-feature embedding process is to integrate the head-to-tail entity embedding vector,the head-to-tail entity position feature vector relative to a character,and the external vocabulary embedding vector in the input sequence into the character vector as the input of the model encoding layer,and then use the BiLSTM model to perform feature extraction,so as to enhance the extraction effect.To validate the model effect,this paper is tested on the general domain dataset and the self-built threat intelligence dataset,and the final experiments show that the two models perform well on both types of datasets,which validates the model effect. |