| The development of informatization has turned the traditional paper-based business process to the digital process of information system.The arrival of the era of big data means that a large amount of business data is recorded in information systems of various industries,including the system’s event logs.The process mining aims to mine a reasonable business process model,and improve the implemented business processes by analyzing event logs.However,many researches in process mining assume the processes to be in a steady state,while the practical experiences confirm that the business processes evolve over time-Therefore,the event logs,which record the behavior of the system,also evolve over time.The existing researches call this phenomenon as the concept drift.In order to mine high-quality process models from evolving logs,this thesis follows the sequence of log pre-processing to model mining,mainly studies four issues:sudden drift detection,evaluation after the log partitioning,the mining of length-two loop structure and the mining of complex loop structure.Specifically,the main contents of the paper are as follows:(1)A novel online drift detection algorithm is proposed to soilve the problem of low accuracy and unsuitable for online scene in the existing methods.First,the equation derived from chebyshev,s inequality is transformed into the size of completeness window for drifl:detection.Next,we apply the divide-and-conquer strategy to divide the drift features into new features and disappeared features.In addition,after detecting a drift point,we use the forgetting mechanism to abandon all feature set and move the detection window.When the above iterations finished,all drift points would be detected in the log.(2)After the 1og partitioning based on the drift points,we present a new method for evaluating the methods without ground truth.First,we adapt information entropy to define the internal entropy,which is used to measure the internal chaos of each sub-log after partitioning.Second,the relative entropy is took advantage of to calculate the distance between the sub-logs.Finally,using the factor of punishment to penalize the bias and sensitivity of the algorithm.(3)Aiming at the problem that the existing algorithms cannot mine the length-two loops from the log without“aba,pattern,we put forword a new mining algorithm.First,we define the behavioral characteristics of the length-two loops in formalization,then the length-two loops and concurrent structures are distinguished from a global perspective.Second,the same type of length-two loops on the concurrent branch of the process model are accurately divided by the follow degree value,where these loops satisfying the behavioral characteristics are in the one set.(4)In order to avoid the influence by the variant of the length-two loops as well as the long-loop on the concurrent branch,an algorithm for mining complex loops is proposed.Initially,we abstract the variant structures and the long-loop structures based on order vector.The order vector is extracted from the event logs,which is used to depict the behavioral relationships of each pair of activities.Finally,we restore the ions of variant structures and long-loop structures,and discover a complete process model.We implement the above algorithms in this thesis,and conduct a comprehensive experimental evaluation for the feasibility of the proposed algorithms through a large number of artificial event logs and the real-life event logs from corporations. |