| With the advent of the big data era,many industries have put forward numerous big data application requirements.Among them,many big data applications have relatively simple requirements and similar functions,which can usually be expressed as a combination of a series of reusable big data computing units.In response to this situation,the service composition technology with better flexibility and simplicity is applied to the development of big data applications with the characteristics of process-oriented and reusable functions.Under this kind of thinking,big data applications can be expressed as a service composition model with big data processing-related services as the core,and the application functions can be realized through the interpretation and execution of the model by the service composition engine.The service composition engine has become a key to the application of service composition technology in the field of big data.However,the traditional service composition engine is often limited to a centralized or quasi-distributed way,and it is difficult to adapt to the requirements of the current big data technology environment that is mainly distributed,especially the large-scale data flow execution control and control involved in the service composition.Optimal scheduling requires specific support and lacks integration with typical big data technology environments such as Hadoop.To this end,in response to the above problems,this article has launched the following main work around the distributed service composition engine for big data applications:1.Aiming at the data flow execution control and big data environment integration issues in the execution of big data service composition applications,a service composition engine architecture under a distributed execution environment is designed.First,at the model level,the existing big data service composition model is extended around the data flow execution control and its detailed definition is given;Secondly,based on the analysis of the execution requirements and key issues of the big data service composition model in a distributed environment,a distributed service composition engine architecture and core modules that consider the big data execution environment and its processing costs are designed.2.Aiming at the data flow optimization scheduling problem in the execution of big data service composition applications,a data-aware service composition execution scheduling algorithm is designed and implemented.The algorithm first divides the big data application task instances into four groups based on business and data characteristics,which are business-constrained and data-intensive,non-business-constrained and data-intensive,business-constrained and data-non-intensive,and non-business-constrained.It is not data-intensive and is based on an improved particle swarm algorithm to dynamically match the task instances in each group with the distributed computing resources,to reduce the transmission of data in the network and shorten the overall execution time of big data applications.The related simulation experiments based on Workflow Sim show that the algorithm has better performance than related scheduling algorithms.3.Based on the above research content,a distributed service composition engine for big data applications is designed and implemented based on an open-source business process engine-Flowable.First,design the core mechanism and process of the distributed service composition engine then defines the core database of the execution engine.Finally,the Flowable-engine is implemented with distributed expansion and big data-specific tasks such as MR tasks and Flink tasks.In addition,the distributed service composition engine was verified with the application case of expressway toll big data statistical analysis,showing its application effect. |