Font Size: a A A

Research On Multiple Types Of Security Data Analysis Model Based On Machine Learning

Posted on:2024-01-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q MaFull Text:PDF
GTID:1528306944470254Subject:Information security
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet of Things,cloud computing,5G and other types of information network and technologies,cyberspace becomes the fifth most important strategic space as ocean,land,sky and space.Cyberspace security plays a vital role in the stable development of cyberspace.The huge amounts of security data collected in cyberspace completely characterize the overall situation of cyberspace.Comprehensive analyses of security data can mine the potential laws in massive network security data,which is helpful to improve the effectiveness of traditional prevention technologies for network security and reduce the occurrences of network security threats.In recent years,the booming development of emerging technologies,such as artificial intelligence and big data analysis technology,has significantly improved the intelligence and automation of multiclass security data analysis technology.However,there are still some bottlenecks in specific applications.On the one hand,when various types of network security data are transformed into mathematical representations,existed methods involve high human intervention.This leads to poor adaptability and robustness of security data analysis models.On the other hand,the building procedure of security data analysis models fails to take full advantage of the inherent structural characteristics of data.To address the above issues of multiclass security data analysis,this thesis focuses on two types of security data analysis problem,including plaintext security data analysis and security data analysis in the process of privacy protection.In this thesis,the efficient and intelligent anomaly detection model and data fusion algorithm are studied.Both techniques play a key role in providing the data insight and decision basis for network security management and protecting users’ privacy and information security.Specifically,the anomaly detection model for network traffic data and the privacy-aware decentralized detection model are studied.The main contributions of this thesis are listed below.1.To reduce the high human intervention in the process of traffic data transformation,a traffic data transformation method is proposed based on automatic extraction of malicious keywords and weight mapping mechanism.The raw traffic data is regarded as natural language.First,the malicious keywords are automatically extracted by defining a malicious degree measurement of the strings in the traffic data.Then the raw traffic data is transformed into fixed-length feature vectors via a weight mapping mechanism based on Gaussian function.The obtained vectors are used to train the subsequent anomaly detection model for network traffic based on machine learning.The proposed method avoids high human intervention in the traffic data transformation process,so that it has high adaptability and robustness to traffic datasets from different sources.Besides,the proposed method can be integrated with several machine learning models for anomaly detection in network traffic.2.For the problem that the intrinsic structure of data samples is grossly underused in the construction of anomaly detection model,a linear model for anomaly detection in network traffic based on support vector machine is proposed with the maximized intra-class similarity of traffic data.This thesis constructs a non-convex optimization problem to maximize the intra-class similarity of the data after linear transformation,so that the separability of different types of training data is maximized.The obtained training dataset with low redundancy is fed into the subsequent traffic classifier based on support vector machine as input vectors.By solving the proposed optimization problem,the dimension reduction of traffic dataset and the training of linear detection model are accomplished simultaneously.The proposed model outperforms benchmark models on the tested datasets collected in the actual environment.Numerical results indicate the effectiveness and robustness to different datasets of the proposed model.3.To deal with the problem that the real traffic data are not linearly separable and reduce the complexity of the hyper-parameter adjustment in anomaly detection model training,a novel model with automatic hyperparameter adjustment is proposed for anomaly detection of network traffic based on kernel support vector machine.Based on the dual formulation of kernel support vector machine and intra-class structure of data samples,this thesis constructs an optimization model to tune the hyper-parameter of the classifier and maximize the intra-class similarity of traffic data after nonlinear mapping in the kernel trick.By solving the proposed optimization problem,the automatic hyper-parameter adjustment and nonlinear detection model training are accomplished simultaneously.The corresponding problem is simply one-dimensional,and efficiently reduces the computational cost of hyper-parameter adjustment.The proposed model outperforms benchmark models on the tested datasets collected in the actual environment.Extensive numerical results indicate the effectiveness and superiority of the proposed model.4.To simultaneously protect users’ data and inference privacy in decentralized detection,a novel privacy-aware model based on data inherent structure and adversarial learning framework is proposed for the nonparametric decentralized detection in the IoT network.The utilityprivacy trade-off is regarded as a binary game problem between the utility and privacy measurement of sensed data.The proposed model aims to design a privacy protection mechanism based on local differential privacy and data projection for sensors,so that the data and inference privacy are guaranteed.This thesis proposes a max-min optimization problem to train the parameters of the privacy protection mechanism and hypothesis detection rules.The proposed optimization problem aims to maximize the ratio of minimized loss functions of private and public hypothesis,with the requirement of the intra-class and inter-class similarity.The proposed model can enhance the data and inference privacy and improve the utility of the data generated by the privacy protection mechanism.Numerical results on four tested datasets demonstrate that the proposed model achieves better utility-privacy trade-off than the state of the arts.
Keywords/Search Tags:security data analysis, machine learning, anomaly detection in network traffic, decentralized detection, privacy protection
PDF Full Text Request
Related items