Font Size: a A A

Design And Application Of Time Series Decision-making System Based On Reinforcement Learning

Posted on:2019-09-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ChengFull Text:PDF
GTID:1360330623963926Subject:Control Science and Control Engineering
Abstract/Summary:PDF Full Text Request
In recent years,machine learning and data mining have been one of the hot topics studied by scholars and industry experts in China and other foreign countries,among which reinforcement learning,as an important branch of the three types of machine learning algorithm,has gradually received the favor of researchers and close attention of industry personnel.Compared with the glorious achievements made by supervised learning algorithm in many fields such as image recognition and natural language processing and etc.,reinforcement learning algorithm,as a general framework applied to time series decision-making system,its optimization potential is far from being tapped,theoretical research is far from perfect,and engineering application is far from being developed,which is of course related to the fact that reinforcement learning is more complex than supervised learning theory.Besides,the characteristics of applicable scenarios are more different,and the ways of implementation are more changeable.However,the more important reason is that decision-making system designers lack a unified understanding of various reinforcement learning algorithm and ignore the differences in the nature of various sequential decision-making tasks.What's worse,they even treat reinforcement learning in isolation from other types of algorithms and give up an effective way to improve the performance of reinforcement learning.All in all,these factors lead to common problems such as slow training,short-sighted and abnormal behavior in the sequence decision-making system designed based on reinforcement learning algorithm,which makes it difficult for the system to achieve the desired effect that designers expect,and even less successful in engineering applications.Therefore,it will have a important theoretical significance and broad industrial application value to study how to design a time series decision-making system by effectively using interactive information according to the characteristics of different tasks.This thesis is aimed at various time series decision-making tasks with different properties,which mainly studies the design and application cases of time series decision-making under various situations,including the interaction mechanism of the known agent,the interaction data of the known agent as well as the interaction model of the agent.Moreover,this thesis also analyzes the acquisition methods of new reward signal sequences in decision-making tasks.According to the generation of expression,inference from historical data and inability to obtain reward signal sequences,the model-based method,data-driven method and modelless method are adopted respectively,with the goal of maximizing the expected value of long-term returns as the design goal.Besides,selecting appropriate reinforcement learning algorithm and reasonable reward function definition to provide the corresponding design method of time series decision-making system.This kind of design method includes the following characteristics.Firstly,decision-making system can embed prior knowledge,reducing training cost and improving actual decision-making effect.Secondly,the designed decision-making system enters the application stage after the training and the system parameters are fixed.Besides,only a single forward calculation is needed.To sum up,it has the characteristics of stable effect,easy deployment and good real-time performance.Thirdly,the design method of time series decision-making system for three different situations is not limited to the application cases given by each of them.For decision-making tasks of different nature,the proposed design method is universally available and can be applied to various application scenarios with different decisionmaking objects and similar reward signal acquisition methods.The main research contributions of this thesis include:1.This thesis proposes an information viewpoint of reinforcement learning,which describes different types of algorithms in reinforcement learning from the same perspective.This unified view is beneficial to understand the essential differences of different algorithms on the surface,find out the common features of different algorithms,and easily compare the dependence of algorithms on interactive information.2.This thesis is aimed at the time series decision task with known agent interaction mechanism,proposing a design method of embedding reward function through target decomposition and fuzzy reasoning.This method takes nonlinear and continuous state space as the research object,and takes unmanned ship autonomous obstacle avoidance and unmanned ship escape tracking as the application examples.Through the mechanism modeling,the dynamic process of the research object is established,and the interactive training platform based on the mechanism model is built,so as to generate sufficient experience samples.According to the actual needs of the task,sub-objective potential functions are designed for reward shaping.The comprehensive sub-goal decomposition method introduces fuzzy reasoning,embeds artificial knowledge into the reinforcement learning system,and effectively improves the learning performance of the agent.In addition,the knowledge base can be further expanded as needed,with the advantages of good expansibility,strong interpretability and intuitive engineering background.3.This thesis is aimed at the time series decision task that the agent interaction mechanism is unknown but the interaction information is easy to obtain,proposing a design method of deep reinforcement learning decision system based on time series analysis framework.This method takes the nonlinear system with large amount of data,complicated mechanism analysis and difficult online learning for safety considerations as the research object,and takes the combustion optimization task of large-scale power plant boiler units as the background,collects data through the distributed control system,and constructs a system dynamic simulator,successfully uses convolutional neural network to integrate different attribute data in space.Moreover,this method is also combined with cyclic neural network to learn the system's change law in time series,encode motion vectors,and train decision makers with discrete reinforcement learning algorithm.The decision-making system designed is suitable for scenarios where the mechanism is difficult to master but the empirical sample data are easy to obtain,which provides an easy solution for data-driven time series decision-making tasks.4.This thesis is aimed at the time series decision-making task in which the agent interaction model is unknown and the state cannot be predicted by interaction information,proposing a design method based on model-free continuous time series decision-making system.This method takes the application scenario of unknown environmental model,unpredictable state transition and continuous decision-making as the research object,and takes the real-time bidding task of Internet advertising as the background,and brings the fragmentary decision-making into the continuous decisionmaking framework through the perspective of probability distribution.The strategy initialization method is used to realize knowledge embedding,optimize the scope of agent exploration and improve the learning speed of the algorithm.Besides,the auxiliary potential function is used to alleviate the sparsity of reward signals.Moreover,the decision task is solved by a differentiable semi-gradient algorithm.The decisionmaking system designed does not depend on the full understanding of the environmental model,which is not limited to the interval length of the decision sequence,and can be optimized from a longer-term perspective without excessively considering the gains and losses within a certain period of time so that more lasting comprehensive benefits will be brought and gained.
Keywords/Search Tags:Reinforcement Learning, Time Series, Decisionmaking System, Information Viewpoint
PDF Full Text Request
Related items