| In a service-oriented system architecture,the existing simple service combined to meet user needs of value-added services,namely the service composition has become a hot spot of great practical value.With the rapid development of web service technology,massive web services with same function rather than no-function attribute(such as QoS)begin to spring up.Thus,the filter of services and the optimization of service composition has become large challenges for service composition problem.On the one hand,how to guarantee basic functions of composite business flows,and on the basic of that to select the appropriate service in order to achieve optimal result(maximize the QoS)in large-scale service scenarios for each task.On the other hand,web services based on network are inherently dynamic and the environment of service composition also is complex and unstable.Thus,web services should keep itself adaptive,making the prompt adjustment with the change of environment and service own evolution.In view of above challenges,web service composition system in large-scale dynamic composition scenarios remain accuracy and efficiency,which is a question awaiting to be solved urgently.This paper primarily focuses on the adaptive service composition in large-scale scenario,mainly finished the following works:(1)According to the complexity of service composition and diversity of candidate services resulting in ex-pansion of service composition scale,the paper proposes an adaptive service composition scheme based on deep reinforcement learning.On the one hand,one form of deep learning-recurrent neural net-work is adopted to improve reinforcement learning algorithm which can predict the objective function,enhance their ability to express and generalization.The scheme is an effective solution to the traditional reinforcement learning in face of massive or defects in continuous state space,which has high application value in large-scale dynamic service composition scenarios.(2)The paper uses the inspirational behavior selection strategy,in which the state set is divided into the hidden and completely visible state set,to perform the targeted behavior selection policy when face with different type of state.The classification,which simulate policy space of hidden spaces and evaluation function of fully observable states,adopts targeted behavior selection strategy,further improve the accuracy and efficiency of composition results.(3)The paper conducts a series of experiments are conducted in this thesis to verify the effectiveness,scal-ability and self-adaptivity of our approach.The experimental result further show advantage of methods we proposed in composition results and efficiency,compared with the traditional reinforcement learning. |