| Domain names play an irreplaceable role in the ever-increasing Internet activities.Network malicious activities such as malware and phishing websites usually require the use of malicious domain names to complete the attack.The detection of malicious domain names is an important means to maintain network security.With the development of technologies related to malicious domain names,new types of malicious domain names are constantly being discovered,and the number of malicious domain names is also increasing.This poses a more severe challenge to the detection effect of malicious domain names.This paper studies the two malicious domains,domain name generated by Domain Generation Algorithm(DGA)and phishing website domain name,and analyzes the current detection methods.In order to improve the adaptability and classification effect of the malicious domain name detection model,the method of malicious domain name detection that uses their own features is studied.First,analyzing the generation mode of the DGA domain name and the character composition characteristics of the domain name,and using the convolutional neural network(CNN)to classify it.The classification model gives the data set semantic features through word embedding,and then sets the size of the convolution kernel according to the pronunciation law of the word,and determines the model structure and the parameters used in each layer according to multiple experiments.Compared with other commonly used classification models,this model does not need to manually extract the features and has stronger adaptability and better classification effect.For most DGA types,the classification accuracy rate can reach more than 95%.Afterwards,in view of the insufficient number of samples of some DGA domains,a combination of transfer learning and CNN was used to construct an improved TCNN(Transfer Convolutional Neural Networks)model.Compared with the DGA detection model based on CNN,the TCNN model introduces knowledge transfer to the detection of DGA domain names for the first time.This model adds a pre-training process on the basis of the CNN model established in this study,and uses DGA domain name samples of other categories for pre-training,and transfers the model parameter knowledge obtained by pre-training to the small-sample classification model training process.Compared with the small sample classification model without knowledge transfer,the classification effect and speed of this model are significantly improved,which is close to the classification effect when the sample is sufficient.Finally,analyzing the domain name character features and domain name information features of the phishing website domain name,and build the Light GBM classification model based on these features.By writing feature extraction algorithms and crawling programs,16 domain name own features such as the minimum number of domain divisions and domain name IP addresses are obtained.The various parameter settings of the model are determined through experiments,and the classification accuracy of the model reaches 93%.Then it is compared with other classification models to prove the better classification effect of this model. |