| In recent years,the outbreak of data exfiltration attacks has caused huge losses to the attacked enterprises and institutions.With the good concealment and penetrating ability of DNS protocol,that exfiltrating data based on DNS has gradually become a popular attack TTPs for many APT organizations.It’s imperative for enterprises and institutions to establish the network defense capacity that monitoring the DNS traffic at the network boundary and then accurately detecting the potential attack behaviors.AI-enabled technologies can effectively improve the network defense capabilities.The scale and quality of attack data restrict the training of AI detection model,which has become an important factor to limit the performance of the trained model.Especially,datasets of DNS-based APT campaigns or malicious samples involved lots of practical problems such as difficulty in obtaining,few in quantity,low in activity and so on.Moreover,the available technologies of data augmentation used in other fields are not suitable for transplanting to such semantic sensitive field.In the case of artificial intelligence algorithms becoming more mature and computing power greatly improved,in order to deal with the key problems that lacking of available data sets and lacking of completeness faced by model training,the research works carried out in this thesis and the main contributions are as follows:1.This thesis innovatively proposes a TTPs-based attack data automatic generation and application method.The data generation method takes attack principle as the theoretical basis to ensure the validity and completeness of the generated data and improve the initiative of the defender in the network defense.At the same time,we designs an application scheme that uses generated data to train the AI model and uses real attack data to verify the performance of the model,so as to promote the construction of the network defense capability equipped with AI technology.2.In this thesis,the present situation of DNS-based exfiltration attacks is summarized,and the attack mechanism and four key technologies are studied comprehensively.Based on the reports of a large number of real APT attack cases and combined with DNS-based exfiltration tools,it also summarized four key technologies,including data embedding and recovery,code conversion,DNS-based exfiltration transmission and politic response.This is not only the theoretical basis for the subsequent automatic generation system,but also a technical reference for researchers which concerning DNS-based exfiltration campaigns.3.The Mal DNS system is designed and implemented based on DNS-based Exfiltration attacks TTPS to generate DNS-based Exfiltration attacks dataset with large-scale,high fidelity,and adjustable integrity.The system has a complete framework of DNS-based exfiltration and perfect expansibility through flexible configuration item design.Therefore,it can not only highly restore the DNS-based exfiltration attacks in the existing case reports,but also predict the unknown DNS-based exfiltration attacks within the scope of attack principles.4.The data evaluation experiments are carried out from the validity and application effect of the generated data.On the one hand,the effectiveness of the generated data can be directly evaluated through the completion of the exfiltration task by the Mal DNS system,and it is easy to validate and label the data.On the other hand,when the AI model is trained by the dataset synthesizing the generated data,the accuracy of the obtained detection model in detecting the real DNS-based Exfiltration attack is more than 99%,and the false positives are approximately zero.Experiments indicate that the generated traffic data are valid,and that can support for the training of the detection model effectively.The obtained model has good performance and can effectively detect the real DNS-based exfiltration attacks. |