| With the rapid development of Internet and related technologies,the problem of information network security has become increasingly prominent.While people enjoy the convenience of network development,they are faced with a large number of malicious network attacks,such as spam,phishing,click fraud and so on.Most of them use malicious URL to complete the attack.Malicious phishing URL is a special type of malicious URL.It imitates the domain name of well-known websites to confuse users and reduce the accuracy of traditional malicious URL detection methods.Aiming at the problem of effective identification of malicious phishing URLs,this paper proposes a detection method of malicious phishing URLs based on deep learning and natural language processing,in order to improve the detection and identification accuracy of malicious phishing URLs.To detect malicious phishing URL based on deep learning,we need to study the input of model,the construction of model and the evaluation of model output.Therefore,this paper mainly studies the character encoding method,feature space extraction and selection,and the effect evaluation of feature extraction.In order to solve the character encoding problem of malicious phishing URL deep learning detection,the strings of each component(protocol domain,host name domain,path domain,file name domain,parameter domain)of URL have their specific forms and special meanings.After comparing the encoding effect of common character encoding methods,a URL data representation model based on skip gram encoder is constructed,which encodes the most fine-grained URL intensively,and transforms the URL string into a machine-readable and more easily recognized digital vector within a specified time.Aiming at the problem of feature space extraction and selection of malicious phishing URL deep learning detection,convolution neural network model and long-term memory network model in deep learning model are selected to complete the spatial dimensionality reduction and feature extraction of malicious phishing URL data from the implicit semantics,timing and other features of URL data.In this paper,the multi-layer LSTM model,the multi-layer convolution CNN model and the CNN + LSTM model are constructed.The above models are used to complete the feature extraction of malicious phishing URL and the comparative analysis of the results.In order to evaluate the effect of malicious phishing URL deep learning detection,this paper selects four classification algorithms: random forest,random tree,Bayes and j48 to classify the feature vectors extracted from each model,and uses a variety of evaluation indexes such as accuracy,error rate and recall rate to evaluate the classification effect of each classification algorithm.Through the evaluation of the feature extraction results of LSTM,CNN,CNN + LSTM,and eXpose multi-core convolution model,CNN + LSTM is selected as the best model for feature extraction of malicious phishing URL data,with the highest detection accuracy of 98.3%.The experimental results show that the proposed model can effectively improve thedetection efficiency and accuracy of malicious phishing URL,and has a good application prospect in the field of network security. |