Font Size: a A A

Noise Classification Of Process Event Logs

Posted on:2021-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:W Y DaiFull Text:PDF
GTID:2512306512987689Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Many business intelligence applications rely on event logs of business processes for decision making,e.g.,process mining,provenance analysis,complex event processing.The real-world event sequences recorded in the log inevitably contain noise,and the presence of these noises can significantly damage the quality of the event log and those decisions based on them.Identifying event sequences with different types of deviations,including redundant,missing,or dislocated events,is conducive to log preprocessing,log repairing,and fault diagnose.Existing approaches leverage predefined process models to determine deviations based on trace-model alignment.However,process models are not always available in practice.Even process models are available,most existing approaches do not scale due to the NPhardness of the trace-model alignment.This paper proposes a machine learning method to address this problem,which requires a training set composed of event sequences whose deviation types are known.Our approach consists of two stages.First,this paper divides the training set into a set of event sequence clusters.Second,this paper predicts class labels of an event sequence based on a subset of the overall training set.Experiments on real-world event logs demonstrate that our approach can effectively and efficiently predict and classify noise without the help of process models.The main work of this paper is as follows:(1)Based on a set of labelled event sequences,this paper formulates the problem of determining whether an unseen event sequence contains deviations,and which types of deviations it may involve into a classification problem.(2)This paper presents algorithms to divide the training set into several clusters of event sequences.To classify an unlabeled event sequence,this paper recommends to use only a few clusters instead of the whole training set for the instance-based learning,which can improve efficiency.(3)Based on the traditional string edit distance,this paper defines an edit-distance between event sequences of business processes that consider the concurrent relationship between events,and measuring the similarity between two event sequences by calculating the minimum editing operations required to convert one event sequence to the other.(4)Based on the edit distance between process event sequence defined in this paper,this paper solves the noise classification problem of process event log by applying multiple binary classifications,and uses KNN algorithm to solve each binary classification problem and determine the final label set of the event sequence of unknown labels.(5)This paper implements our approach as a ProM plug-in.We use the tool to conduct an experimental evaluation on real-world event logs,the results of which demonstrate both the effectiveness and efficiency of our approach.
Keywords/Search Tags:classification, noise, event sequence cluster, process event log, process mining
PDF Full Text Request
Related items