| Malicious activity on the Internet is one of the most dangerous threats to users and organizations.Due to the flexibility and accessibility of domain names,criminals often use domain names to launch cyber attacks,so how to detect malicious domain names has become the focus of current research.Traditional malicious domain name detection methods mostly rely on feature engineering to learn the characteristics of malicious domain names to detect malicious domain names,but these methods are easily avoided by some complex avoidance techniques.The above problems can be avoided by focusing on domain name topological features.Recently,although some graph inference methods have achieved good results in detecting malicious domain names,they still have drawbacks such as inflexible detection methods,only focusing on local topological relations,and ignoring node attributes.So this paper first put forward a malicious domain name training data generation technology for further research,and for the existing graph reasoning detection malicious domain name shortcomings,focus on using DNS traffic build graph then detect malicious domain name,the main work is as follows:(1)put forward a malicious based on improving the CNN architecture of domain name training data generation technology.Aiming at the current situation of insufficient malicious domain name training dataset,the self-attention mechanism of Bi-LSTM is used by improving the traditional CNN model and combining the relevant ideas of text generation.The experimental results show that the domain name data generated by this method is highly similar to real malicious domain names and can be used for malicious domain name detection experiments.(2)A graph inference algorithm for analyzing the global association of domain names is proposed.At present,the techniques of identifying malicious domain names by analyzing DNS data are mostly building classifiers based on DNS-related local domain name features,but local features such as domain name features and temporal features have poor robustness,leading to inaccurate detection results.This paper uses passive DNS data to analyze the global association between domain names,thus replacing the previous model focusing only on local features,and a large number of new malicious domain names can be found using a very small collection of known malicious domain names.An algorithm to calculate the credit score for each node in a DNS graph is presented.Build a DNS graph using the domain name and host ip as a data source,mining the intrinsic relationship between the domain name and host ip.On the basis of confidence propagation algorithm(BP algorithm),add algorithm conditions to calculate the credit score of the node,and infer the malicious probability according to the credit score.The superiority of the method is verified by comparing with the malicious domain name detection method.(3)A property heterogeneous graph neural network model,AHGN is proposed for detecting malicious domain names in a semi-supervised learning paradigm.Modeling the DNS scene using the attribute-heterogeneous information network,the inference of the DNS graph is completed by designing the fine-grained node type-aware feature transformation and edge typeaware convergence mechanism to fuse the node attributes and structural information simultaneously.We experimentally evaluate the performance of the model on large-scale passive DNS data,and the experimental results show that the proposed method outperforms the previously proposed method on most evaluation metrics. |