Font Size: a A A

The Erlang (k) The Bandit Sampling Process

Posted on:2007-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiangFull Text:PDF
GTID:2190360215986462Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Bandit sampling processes are considered in this thesis, which include two basic models: Bandit reward processes and Bandit target processes. Gittins discussed Bandit sampling processes based on common distributions, such as Bernoulli distribution and negative exponential distribution. Based on Gittins' results, more widely-used Bandit sampling processes are investigated by using dynamic programming backward induction and the Bayesian approach.The main achievements of this thesis are listed as follows:Firstly, important properties of several special Bandit sampling processes are examined. The posterior distribution of the unknown parameter, the conditional distribution of sampling values and the function of sampling rewards are derived and the monotonicity of them are discussed.Scecondly, the optimal decision problem of the Erlang(k) Bandit reward process whose sampling values have an Erlang(k) distribution is discussed. An algorithm is constructed to calculate the sequence of break-even values characterizing optimal selections of arms. The asymptotic properties of the Gittins index and the sequence of break-even values are also studied. Thus, the optimal decision problem of the Erlang(k) Bandit reward process can be solved effectively. It is more practical to use an Erlang(k) distribution instead of a negative exponential distribution, which is an extension of Bandit reward process.Thirdly, the optimal decision problem of the Erlang(2) Bandit target process whose sampling values have an Erlang(2) distribution is studied and an algorithm is presented to calculate the sequence of break-even values characterizing optimal selections of arms. It is more practical to use an Erlang(2) distribution instead of a negative exponential distribution, which is an extension of Bandit target process.Lastly, random sampling times are considered in our models. From now on, almost all bandit processes assume uniform or geometric discounting rather than real-time discounting which is more appropriate in some practical cases. A special Bandit reward process with unrestricted switching times are examined in this thesis, whose random sampling times have a negative exponential distribution and sampling values have an Erlang(2) distribution. The monotonicity of the Gittins index of this process is discussed and an algorithm is constructed to compute optimal stopping times of this Bandit reward process. It is more practical to consider random sampling times (real-time discounting). And by using an Erlang(2) distribution, supplement and extension to Bandit reward processes are obtained.
Keywords/Search Tags:Bandit sampling process, Gittins index, break-even values, Bayesian approach, Erlang(k) distribution
PDF Full Text Request
Related items