| We study an admission control problem in which the customer arrival rate is unknown and needs to be learned from data using Bayesian inference. Two key defining features of this model are that: (1) when the arrival rate is known, the DP equations can be solved explicitly to obtain the optimal policy over the infinite horizon, and (2) uninformative actions are unavoidable and occur infinitely often.;We extend the standard proof techniques for Thompson sampling to admission control, in which uninformative actions occur infinitely often, and show that asymptotically optimal convergence rates of the posterior error and worst-case average regret are achieved. Finally, we show that under simple assumptions, our techniques generalize to a broader class of policies, which we call Generalized Thompson sampling. We show that this class of policies achieves asymptotically optimal convergence rates and can outperform standard Thompson sampling in numerical simulation. |