| In recent years,the booming Internet has created enormous convenience for people's daily lives.At the same time,convenient web services have attracted many attackers to make illegal profits through malware,phishing,and spam e-mails.These pages are called malicious web pages.Although they all need unsuspecting users to access the web address provided by the attacker for the purpose of attack,the purpose and means of these illegal activities are different,and the actual number of different kinds of web pages and the cost of misclassification are also very different.The current research is mainly aimed at a certain kind of malicious webpages,and the resear-ch on multi-classification of malicious webpages is still relatively rare.The features extracted from the research of machine learning method classification still need to be improved.In view of the above problems,this thesis proposes a classification method of supervised machine learning that combines CSS features and URL features,and considers the situation that data imbalance and misclassification cost are different.The"misclassification cost sum"is proposed as a new metric.Three-category research on phishing pages,malware download pages,and benign web pages.The main results of the thesis are as follows:Firstly,the new CSS features are proposed,and the effectiveness of these features in the identification of malicious web pages for malware downloads is proved.By combining and adding features,the accuracy of recognition of all malware download web pages is improved.Stable to 92%,the recognition accuracy of the webpage using redirected malware is up to 99%,and the time performance is improved.Secondly,the new metrics of misclassification cost and the new metric are proposed,considering the actual data ratio and the misclassification cost.A large number of experiments were carried out to prove the rationality of the new indicators and the rationality of the classification method.Finally,combining the theoretical analysis and engineering technology,this thesis designs and implements a multi-class malicious webpage identification system for phishing webpages and malware downloading webpages,and tests the accuracy and stability of the system. |