Font Size: a A A

Dynamic Allocation Of Non Standard Multi-armed Bandit

Posted on:2017-02-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:W Q BaoFull Text:PDF
GTID:1220330485972984Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
The main purpose of this paper is to extend multi-armed bandit models with index policy, making it more realistic to the background, such as, each arm has its own restricted switching times; each arm has its own discount process; preemptive-reapt breakdowns with incomplete information. To do this, we need to develop firstly the related theories on optimal stopping problems and nonparametric Bayesian methods.In view of the above purpose, full details are as follows:To discuss the problems of optimal stopping times to a regular set of variables indexed by partially available stopping times, we use classical probability theory to give general conclusions. It subsumes the classical framework in continuous-time, discrete-time, as well as semi-Markov settings as special cases, which is as follows. Firstly, on the set of variables with a single index of stopping times, we introduce the set of allowable stopping time to establish optimal stopping model with restrictions, characterize the two families of value function, establish necessary and sufficient conditions for the existence of optimal stopping times, characterize the minimal and the maximal optimal stopping times, and discuss the nature of the value of local variables family, regularity and the like; Secondly, the restricted optimal stopping problems are extended to the case with double indices, of which the results can be generalized to the multi-index case; Thirdly, as a by-product, we solve the properties of the countable decomposition of a accessible set.In dynamic allocation of MAB in continuous time, we deal with the case that each of the stochastically independent arms is associated with a random time set and switching from the arm to another is allowed only when the processing time of this arm belongs to that restriction set, in order to maximize the total expected present value of the bandit system. By introducing restricted time set, the classical theory of optimal stopping times to the stochastic process is first generalized to allow restricted stopping times, and with the ideas of Kaspi and Mandelbaum (1998), the results are then applied to deduce the relation between Gittins index process for a single arm with restricted stopping times and its the instant reward rate process. At last the optimality of an Gittins index policy is verified using excursion method of Kaspi and Mandelbaum (1998). New techniques are also introduced so that the new proof is drastically simpler than the ones in the literature.Furtherly, the extended model with a non-uniform discount process of each arm is considered. After selecting two kinds of expected total discounted reward, we define their appropriate indices by the theory of optimal restricted stopping times in continuous time, and draw the conclusion that one of index policies is to maximize its objective function, while the other is not.As to stochastic scheduling subject to preemptive-repeat breakdown with incomplete information, by using Bayesian method and selecting the expected discounted reward (EDR) as the objective function, we discuss the characteristics of the optimal strategy index under static policies and dynamic dynamic policies respectively, especially one-step reward rates of dynamic policy of different Bayesian frameworks. To the static policy, the results of the general framework is similar to that of the parametric frame; as for dynamic policy, by analyzing the relationship between one-step rewards and Bayesian frameworks, we find the different frameworks of Bayesian method have different impact on the Gittins index.
Keywords/Search Tags:Optimal stopping times, restricted stopping times, Snell’s enve- lope, Multi-armed bandit process, Gittins index, retirement method, excursion theory, Bayesian Method
PDF Full Text Request
Related items