| Malware is a program that performs malicious tasks on a computer system.With the development of Internet technology,malware atta cks are increasing exponentially and have been a key threat to Internet security.The detection method of malware is very important for security holes prevention,Internet data theft and many other dangers.Therefore,malware detection has been an important research area.The interference of confusion and deformation of traditional detection methds can reduce the accuracy and efficiency of malware detection.With the progress of machine learning and deep learning,researchers have applied these two technologies to the field of malware detection and achieved remarkable results.However,in the current research of malware,the ignorance on the importance of attributes leads to low detective efficiency and unoovious effective information.On the other hand,the imbalance of data sets can affect the accuracy of malware detection.To solve these problems,this paper proposes a malicious software detection method based on stacking.The main contents include the following three aspects:(1)In view of the high dimension of malware attributes,it is difficult to screen out the important attributes,which leads to the problems of low accuracy and low detection efficiency.In this paper,we use the characteristics of best value search of Beetle Antennae Search algorithm and the formula of attribute importance to optimize the data dimension of PCA algorithm,so as to remove less effective attributes and get more important attributes.This method effectively simplifies the data complexity,highlights the important information,and reduces the time and space complexity of the detection method while ensuring the original data distribution and important information.(2)Aiming at the problem that the imbalance of data sets and the importance of attributes can affect the accuracy of malware detection methods,this paper improves the random forest algorithm and proposes a random forest optimization algorithm based on fuzzy decision,as one of the basic classifiers of Stacking algorithm.The scheme solves the problem of data imbalance and improves the accuracy of malware detection.(3)The optimized random forest algorithm,gradient boosting decision tree algorithm and logical regression algorithm are integrated into the stacking algorithm.The preprocessed data sets are used as malware samples in the way of cross validation,and the stacking algorithm is applied for malware detection.In addition,the detection method and optimization algorithm proposed in this paper are compared and analyzed with experiments,and the objective evaluation indexes such as accuracy rate,precision rate,recall rate and AUC value are used for multi-dimensional verification.The experimental results show that compared with the current popular malware methods,the detection accuracy and efficiency are effectively improved by this method. |