| In the era of big data,how to find and obtain higher-value information from a large amount of data has received increasing attention.Both academic and industrial researchers have invested in it,hoping to create greater value.However,in the course of continuous research,the acquisition of research objects(ie large data sets)has become a new challenge.Factors such as national security,trade secrets,and privacy information protection make it impossible for government agencies and some large Internet companies that own such data to disclose and share data as they please,even if the sharing is shared or partially shared with collaborators within the scope of permitted secrecy.This is a challenge for some people who do not have complex events big data but are keen to study.The Complex Event Processing(CEP)technology,as a data processing technique that accompanies streaming data processing needs,has outstanding performance in processing with diverse and streaming feature data,and is widely used in the Complex Event Big data Processing(CEBP)system.In this paper,aiming at the difficult problem of getting complex events big data,according to the characteristics of the complex event big data,using CEP technology as a reference,a method for generating complex event big data based on Bayesian network is proposed.This method takes part of the real sample data as the research object,combines the experience of experts in related fields,gives the definition of a complex event model,and uses algebraic expressions to describe the specific event information in the data set,such as cause and effect,sequence,selection,collaboration and so on.The complex event big data stream that is continuously generated,flowed in,and aggregated by these basic events is called CEBP data flow or CEBP event flow.Then,combine the CEBP data flow with the Bayesian network.Nodes to describe complex events and edges to describe the relationships between the nodes.Conditional probability tables are used to describe the probability of events occurring under different conditions.The characteristics of events in the data set are characterized vividly and accurately.Based on the CEBP Bayesian network data flow model,with reference to the ratios and different probability distributions of different events in the data set and the relationship between them,to expand the event model in the sample data set and update the corresponding probability table.Then according to the model and the requirements to generate different scale of complex event datasets.This paper takes route data and GPS data as experimental research objects,designs and implements complex data Bayesian network model algorithm and complex data generation algorithm.Using comparative analysis and similarity analysis methods to analyze complex event datasets respectively,compared to the sample dataset,the 100 generated routing data have a stable proportion events relationship,and the 10 generated GPS datasets have a similarity of above 0.7.And eventually formed a complex data generation tool.Experimental results show that the proposed method has certain feasibility and effectiveness. |