| Network Traffic Classification(NTC)and Network Intrusion Detection Systems(NIDS)are two parallel research lines in the field of Network Traffic Monitoring and Analysis(NTMA)aiming to support network management tasks such as quality of service,billing,resource usage planning,and network security.The fields have received considerable attention from the industry and research communities for a long time,with many techniques such as classical machine learning(ML)currently available.However,the emergence of technologies such as cloud computing,the Internet of things(Io T),and 5G has led to explosion in volume and diversity of network traffic.This means that Internet traffic has become more complex and diverse,and therefore,the effectiveness of traditional NTMA techniques has been greatly reduced.Deep learning(DL),with its ability to extract complex patterns from large-scale data,has shown success in various domains.Researchers in the networking field have begun applying deep learning models for NTMA tasks like NTC and NIDS.Deep learning models have higher learning capacity and can directly learn and extract relevant features from input data,eliminating the need for manual feature selection by domain experts.This characteristic makes deep learning methods highly desirable for NTMA tasks.However,a significant challenge is the lack of labeled or partially labeled network traffic data.Collecting and labeling such data is time-consuming and computationally expensive due to the volume and velocity of raw data generated daily.To address these challenges,this thesis proposes novel deep learning techniques that can achieve high classification and detection accuracy using a small labeled dataset.The key contributions of this thesis are:(1)providing a comprehensive taxonomy and survey of state-of-the-art techniques for NTC and NIDS.(2)identifying and describing the challenges associated with applying DL techniques for NTC and NIDS.(3)proposing a novel encrypted traffic classification method.One of the challenges of classifying encrypted traffic is capturing and labeling a large amount of encrypted traffic data for training deep learning classifiers.Current techniques rely on deep packet inspection tools(DPI),which perform poorly on encrypted traffic.Therefore,the thesis proposes a semisupervised learning method using generative adversarial networks(GAN).The basic idea is to utilize the samples generated by the GAN’s generator and unlabeled encrypted traffic data to improve the performance of a classifier trained on a few labeled encrypted traffic data.(4)Unsupervised anomaly-based network intrusion detection techniques offer promising alternatives to the difficulty in acquiring large-scale malicious samples for training supervised learning methods.However,these techniques rely on learning representations of normal traffic data and identifying deviations from this normal behavior as anomalies.By focusing solely on preserving data regularity,they may not effectively capture the unique characteristics of anomalies and often suffer from a high false alarm rate.To address this limitation,the thesis proposes a weakly supervised method for anomaly detection using generative adversarial networks(GAN).The basic idea is to combine both normal and a few anomalous instances in the reconstruction learning objective,thereby explicitly enforcing higher reconstruction error for anomalous instances and vice-versa.This enables the model to learn good representation tailored specifically to the intended anomaly instead of learning uninteresting data noises.(5)proposing a network intrusion detection method using few-shot learning.Collecting a large-scale dataset for network intrusion detection can be challenging as the network traffic data is dynamic,and the landscape of cybersecurity is constantly evolving.Therefore,the thesis proposes a transfer-learning-based few-shot network intrusion detection,which aims to learn from a few labeled examples.We first train a feature extractor model using discriminative representation learning with a supervised autoencoder,and then train a classifier on top of the learned feature extractor model,which enables the model to generalize with few examples.The techniques presented in this thesis address some of the key challenges in employing DL for NTMA task such as achieving high classification accuracy with limited labeled data. |