| Process mining is an important bridge between data mining and business process management with correlated optimization research.Its main goal is to extract processrelated information from the event data in the log to build the corresponding business process system model.The main research of process mining includes model discovery,conformance checking and enhancement.Conformance checking is an important and challenging research content in process mining.Conformance checking can be used to detect the deviation of the process log from the process model,so it is widely used in the analysis for organizational compliance and other business process management.Most of the existing conformance checking methods are based on rule checking,token replay and alignment.The main starting point of these methods is based on the situation that the process reference model is known,and they rarely take the problem of how to effectively measure the consistency directly through the event log into consideration,specially when the system model keeps unknown in prior.Moreover,the lastest conformance checking computation time complexity is comparatively high,and it is not suitable for some large-scale log computation scenarios.In addition,in many practical applications,the calculation of consistency needn’t to obtain a precisely accurate value,as long as an approximate value is probably meet the requirements.Therefore,it is very important to carry out efficient approximate conformance checking in the case of large-scale log or unknown reference model.This article applies trace clustering and machine learning techniques to approximate conformance checking.This paper firstly summarizes the research of trace clustering technology,then proposes a consistency detection method based on machine learning classification vector,and finally proposes a conformance checking method based on trace clustering,which can efficiently calculate the approximate conformance checking value when the system model keeps unknown in prior.The main contents of this paper are as follows.(1)The trace clustering algorithm in the past 15 years was studied and analyzed,where trace clustering methods are taken as the main analysis object in process mining.The results of the state-of-the-art literature were analyzed,summarized and discussed from the persectives of clustering technology classification,trace encoding methods,application scenarios of trace clustering,etc.Three kinds of public benchmark datasets were selected to simulate experiments using different trace clustering methods.(2)To address the limitation that most of the existing studies evaluate the consistency degree from the perspective that the system reference model is known,a deep learning method of log-log trace consistency evaluation was proposed.The trace of the log training set was classified into 0-1,and the traditional KNN,random forest,QDA,LDA and sequence mining classification methods GRU and LSTM were compared.The method with the highest classification accuracy was selected for log preprocessing and the classification vector was obtained.Then,the classification vector was integrated into the log sample training set to fit the consistency of the test trace.The experimental results show that compared with the training set without adding the classification vector,adding the deep learning classification vector to the training set can get a higher consistency fitting index.(3)Aiming at the fact that the existing approximate conformance checking method does not consider the similarity of traces to improve the conformance calculation effect,this paper proposes to use trace clustering technology for approximate conformance checking.The event log is encoded,and then the trace clustering technique is used to divide the event log into different behavioral subsets,and the event logs in each behavioral subset have behavioral similarities.Then the event log in each behavior subset is encoded respectively,and finally the convolutional neural network is used for fitting to obtain the approximate consistency value.Figure [19] Table [14] Reference [97]... |