| Financial portfolio optimization is the process of continuously distributing funds to different financial assets according to a certain strategy in order to obtain a greater cumulative return.With the development of deep reinforcement learning technology in the field of machine learning,it has become possible to use computers to continuously interact with the real environment for self-learning and optimize the investment portfolio through a large amount of calculations.In addition,financial market data has characteristics such as non-linear patterns and low signal-to-noise ratio,and deep reinforcement learning has played a certain advantage in it,so it has quite good application prospects.Aiming at the problem of dynamic optimization of investment portfolio,this paper shows a deep reinforcement learning solution framework that does not use traditional financial models.The framework includes Actor network,Critic network,experience replay mechanism and reward function for instant reward.The Markov decision process(MDP)is used to describe the research problem in this paper.The deep deterministic strategy gradient(DDPG)algorithm is used to avoid the problem of excessive data correlation and difficulty of algorithm convergence in reinforcement learning,that is,empirical playback and dual network The structure(including the current network and the target network)is combined to improve,so as to achieve a better model training effect,and provide machine learning solutions for portfolio optimization problems.Specifically,four different network structures are constructed based on a deterministic strategy gradient,and sample data is collected and iteratively updated through an experience playback mechanism.The agent is trained to continuously learn interactively with the environment,to simulate the investment trading behavior of the stock market,and to use convolutional neural.The network(CNN)fits the estimates of the strategy function and the value function to continuously optimize the network parameters.This paper uses the DDPG algorithm in deep reinforcement learning to construct a complete and feasible portfolio optimization strategy,which has strong applicability and promotion.In the empirical analysis part,this paper takes the Chinese stock market as an example,and selects 35 CSI 300 index stocks with lower correlation coefficients as the target of risk assets through nearest neighbor propagation(AP)clustering.The performance under a series of factors such as learning rate,different feature combinations,etc.Select the best performing feature parameters to train the model,and set up a portfolio based on the traditional static Markowitz model(minimum risk combination and optimal Sharpe ratio combination)as a control group to verify the validity of the model.The results show that the dynamic portfolio strategy based on deep reinforcement learning method has the largest average daily rate of return and Sharpe ratio in the out-of-sample interval compared to the other two control groups.Among them,the average daily rate of return is 0.493%,which is about 2.6 times the minimum risk portfolio and 1.5 times the optimal Sharpe ratio portfolio;the Sharpe ratio is 1.379,which is about 1.6 times the minimum risk portfolio and 1.2 times the optimal Sharpe ratio portfolio.The results show that the portfolio strategy obtained by the DDPG algorithm is superior to the traditional Markowitz portfolio model in model effectiveness,and therefore the effectiveness of the deep reinforcement learning optimization strategy constructed in this paper is verified. |