| As many business processes in the real world are usually executed in a highly flexible environment,and this highly flexible configuration environment can induce densely distributed process instances with multiple complex behaviors.In these scenarios,incomprehensible spaghetti-like models would generated when using process discovery algorithms directly.Therefore,in order to reduce the complexity of the model and improve the quality of the log,trace clustering and log repair are two common used solutions.Trace clustering can divide the observed different behaviors into several groups of multiple sub-logs with similar behaviors,so that the cases belonging to the same sublog usually come from the same scenario.Most of the commonly used trace clustering methods utilize relatively single criteria,such as only considering the control flow relationship of the log,while ignoring the behavior relationship,time or resources and other attributes of the activity,which is unfavorable for some flexibly configured business process systems to improve the quality of process mining.Therefore,this paper combines other attributes of logs to explore trace clustering methods from multiple perspectives to reduce heterogeneity.And the proposed trace clustering method is applied to the field of log repair to improve the quality of logs.The main research contents of this paper are as on the follows.(1)A multi-perspective trace clustering method combining activity behavior relationship and association time is proposed.Firstly,the control flow code is constructed according to the behavior relationship between activities;At the same time,in the time attribute,the trace is represented as a group of recently associated activity pairs and their time difference;Secondly,the weighted aggregation method is used to integrate the trace similarity under the two perspectives,and then the clustering adjustment is carried out.Finally,the proposed method is applied to the login system scenario and compared with other clustering methods on five real logs.The experimental results show that the method can discover process scenarios from complex login systems,and its advantages are verified from three metrics,which are fitness,accuracy and F1 score.(2)A hybrid feature clustering method combining the sequence pattern of trace and resource attributes is proposed.Firstly,sequence mining is used to extract frequentsequence patterns from the event log,and the similarity of control flow angle is calculated by the types of sequence patterns contained in the trace.Secondly,for resource attributes,the similarity is calculated by the frequent itemsets of different resources contained in the trace.Then,the two perspectives are weighted and combined.Finally,the feasibility of this method is verified by a case and a real log.(3)To solve the problem of missing values in event logs,this paper proposes a method based on trace clustering to repair multiple consecutive missing values.Firstly,the event log is divided according to the integrity of the trace,and then the cluster algorithm is applied to the complete log to generate a homogeneous trace cluster.Then,by matching the missing trace to the most similar sub-log,the candidate sequence according to the context of the missing part is generated,and the context probability of each candidate sequence is calculated,where the one with the highest probability is selected as the repair result.Finally,the feasibility of this method is verified on four kinds of event logs with different missing ratios,and the experimental results show that the proposed method has certain advantages compared with the existing methods.Figure [18] Table [17] Reference [89]... |