| With the innovative development of big data,artificial intelligence,5G and other emerging technologies,industrial production processes are showing more digitalization,automation,networking and intelligence.But with this,cyber-attacks are increasingly turning to critical national key infrastructure and industrial control systems such as electric power,water conservancy,and telecommunications.Industrial control system attacks are becoming more and more frequent,and industrial control protocol security has become particularly important.Protocol format reverse extraction(also known as format inference)technology,by analyzing network trace or program execution trace to infer the unknown protocol format,can be applied to automatically obtain the protocol specification of industrial control protocols,and further support the application of intelligent fuzzing of industrial control network protocols and refined intrusion detection.However,the current research related to protocol format reverse extraction technology mainly focuses on Internet protocol and text-based message processing,which is difficult to apply to the binary protocol message-based industrial control protocol reverse.This paper focuses on the limitations of the current protocol reverse technology for industrial control protocol,proposes a targeted network trace-based industrial control protocol reverse analysis algorithm according to the characteristics of industrial control protocols,and designs and implements an industrial control protocol reverse prototype system.techniques.The main works of this dissertation can be summarized as follows:(1)Proposed an industrial control protocol clustering algorithm based on the LDA topic model and improved density peak clustering.To address the problems that binary data stream oriented definition for industrial control protocol messages,with ambiguous semantic features,the existing protocol message clustering algorithm has clustering centers that cannot be automatically selected,clusters that cannot be automatically generated,and clustering accuracy that is not enough,this paper uses the LDA topic model to extract keywords from the industrial control protocol messages after N-Gram splitting and uses keywords and their associated probabilities as industrial control protocol message features.The improved density peak clustering algorithm is used to automatically identify the clustering center,which effectively improves the differentiation between the clustering center and the sample points of other messages.Through testing on Modbus,DNP3 and s7 common protocol messages,the purity and F1 value of this algorithm reached over 90%;in comparison with K-means and Netzob methods,the F1 value of LDMP is 10 and 27 percentage points higher in the clustering of three protocols respectively,which has better clustering effect on industrial protocol messages and can effectively distinguish different functions of industrial protocol messages.(2)Propose a format extraction method for industrial control protocol messages based on progressive multiple sequence matching and multivariate statistics.To address the problems of existing format extraction techniques applied to industrial control protocols,such as difficulties in locating field boundaries and template curing in sequence matching,this paper uses progressive multiple sequence matching to align industrial control protocol messages,applies multivariate statistical analysis method to analyze the message boundary information by variance and Shannon entropy,proposes division degree to divide and fuse message segments,and extracts industrial control protocol message formats.The experimental test results show that the correctness and conciseness of the format extraction of DNP3 protocol reaches more than 90%,and the test results of Modbus protocol are both above 85%,which improves the correctness and conciseness by 10 percentage points on average compared with PI and Netzob methods.(3)ICSPRE,a prototype system for reversing industrial control protocols,was implemented,and the effectiveness of the proposed method for reversing industrial control protocols was verified by comparing the message formats extracted by Wireshark parsing and ICSPRE system for two typical industrial control protocols,Modbus and DNP3. |