| In recent years,the attack and defense game around malicious websites continues,and malicious websites gradually become complex and evasive.More and more malicious websites equip with cloaking technologies to evade the detection,making it difficult for existing client honeypots to capture the characteristic information of the attack process,which leads to the failure or high omission of the existing malicious website detection methods.Aiming at these problems,a method for capturing fine-grained behavior of web pages is designed and implemented,and we proposes a method to detect malicious websites based on the similarity of attack behavior.Our work and contributions are as follows::(1)A malicious web page fine-grained behavior capture system based on low interactive honeyclient is designed and implemented.By extending the existing honeyclient kernel,the system records the behaviors of important execution entities of web page in the rendering process and the calling relationship between them,and then characterizes the attack process of malicious websites as a web page execution graph.The experimental results show that our system can capture fine-grained behavior features,the webpage execution graph can effectively represent malicious website behavior in real scenarios.Under reasonable resource constraints,our system can detect and analyze 300,000 URLs per day.(2)A malicious website detection method based on web page execution subgraph similarity is proposed.The method uses the similarity of malicious websites in various attack stages such as environmental detection and redirection to detect maliciousness.Specifically,this method proposesa webpage execution graph clustering algorithm,which clusters malicious samples with similar substructures together by calculating the similarity between webpage execution graphs,and then extracts common substructures from the clustering results.The pathes are used as attack templates,and finally the maliciousness is determined by comparing the similarity between the web page under test and the attack templates.The experimental results show that the method can distinguish98.7% of malicious web pages with high detection accuracy(97.0%)and low false positive rate(5.0%). |